Add MAAS spec

We intend to add support for MAAS, another bare metal provisioning
and management service.

Note that the tox job fails at the moment as "whitelist_externals"
has been deprecated in favor of "allowlist_externals", so we'll
need to address that as well.

Another issue that we need to fix is that sphinx is unable to
locate some image files:

  /home/zuul/src/opendev.org/openstack/watcher-specs/doc/source/
  specs/newton/implemented/scoring-module.rst:232: WARNING: image

  file not readable: doc/source/images/scoring-module-deployment.png

Change-Id: I54b3c578f677ad7d554732a5018163e4780f9457
This commit is contained in:
Lucian Petrut 2023-10-23 12:01:19 +03:00
parent 86df397e3e
commit ea19cfd56e
9 changed files with 543 additions and 17 deletions

379
specs/2024.1-template.rst Normal file
View File

@ -0,0 +1,379 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Example Spec - The title of your blueprint
==========================================
Include the URL of your launchpad blueprint:
https://blueprints.launchpad.net/watcher/+spec/example
Introduction paragraph -- why are we doing anything? A single paragraph of
prose that operators can understand. The title and this first paragraph
should be used as the subject line and body of the commit message
respectively.
Some notes about the watcher-spec and blueprint process:
* Not all blueprints need a spec. For more information see
http://docs.openstack.org/developer/nova/blueprints.html#specs
* The aim of this document is first to define the problem we need to solve,
and second agree the overall approach to solve that problem.
* This is not intended to be extensive documentation for a new feature.
For example, there is no need to specify the exact configuration changes,
nor the exact details of any DB model changes. But you should still define
that such changes are required, and be clear on how that will affect
upgrades.
* You should aim to get your spec approved before writing your code.
While you are free to write prototypes and code before getting your spec
approved, its possible that the outcome of the spec review process leads
you towards a fundamentally different solution than you first envisaged.
* But, API changes are held to a much higher level of scrutiny.
As soon as an API change merges, we must assume it could be in production
somewhere, and as such, we then need to support that API change forever.
To avoid getting that wrong, we do want lots of details about API changes
upfront.
Some notes about using this template:
* Your spec should be in ReSTructured text, like this template.
* Please wrap text at 79 columns.
* The filename in the git repository should match the launchpad URL, for
example a URL of: https://blueprints.launchpad.net/watcher/+spec/awesome-thing
should be named awesome-thing.rst
* Please do not delete any of the sections in this template. If you have
nothing to say for a whole section, just write: None
* For help with syntax, see http://sphinx-doc.org/rest.html
* To test out your formatting, build the docs using tox and see the generated
HTML file in doc/build/html/specs/<path_of_your_file>
* If you would like to provide a diagram with your spec, ascii diagrams are
required. http://asciiflow.com/ is a very nice tool to assist with making
ascii diagrams. The reason for this is that the tool used to review specs is
based purely on plain text. Plain text will allow review to proceed without
having to look at additional files which can not be viewed in gerrit. It
will also allow inline feedback on the diagram itself.
* If your specification proposes any changes to the Watcher REST API such
as changing parameters which can be returned or accepted, or even
the semantics of what happens when a client calls into the API, then
you should add the APIImpact flag to the commit message. Specifications with
the APIImpact flag can be found with the following query:
https://review.opendev.org/#/q/status:open+project:openstack/watcher-specs+message:apiimpact,n,z
Problem description
===================
A detailed description of the problem. What problem is this blueprint
addressing?
Use Cases
----------
What use cases does this address? What impact on actors does this change have?
Ensure you are clear about the actors in each use case: Developer, End User,
Deployer etc.
Proposed change
===============
Here is where you cover the change you propose to make in detail. How do you
propose to solve this problem?
If this is one part of a larger effort make it clear where this piece ends. In
other words, what's the scope of this effort?
At this point, if you would like to just get feedback on if the problem and
proposed change fit in Watcher, you can stop here and post this for review to
get preliminary feedback. If so please say:
Posting to get preliminary feedback on the scope of this spec.
Alternatives
------------
What other ways could we do this thing? Why aren't we using those? This doesn't
have to be a full literature review, but it should demonstrate that thought has
been put into why the proposed solution is an appropriate one.
Data model impact
-----------------
Changes which require modifications to the data model often have a wider impact
on the system. The community often has strong opinions on how the data model
should be evolved, from both a functional and performance perspective. It is
therefore important to capture and gain agreement as early as possible on any
proposed changes to the data model.
Questions which need to be addressed by this section include:
* What new data objects and/or database schema changes is this going to
require?
* What database migrations will accompany this change.
* How will the initial set of new data objects be generated, for example if you
need to take into account existing instances, or modify other existing data
describe how that will work.
REST API impact
---------------
Each API method which is either added or changed should have the following
* Specification for the method
* A description of what the method does suitable for use in
user documentation
* Method type (POST/PUT/GET/DELETE)
* Normal http response code(s)
* Expected error http response code(s)
* A description for each possible error code should be included
describing semantic errors which can cause it such as
inconsistent parameters supplied to the method, or when an
instance is not in an appropriate state for the request to
succeed. Errors caused by syntactic problems covered by the JSON
schema definition do not need to be included.
* URL for the resource
* Parameters which can be passed via the url
* JSON schema definition for the body data if allowed
* JSON schema definition for the response data if any
* Example use case including typical API samples for both data supplied
by the caller and the response
* Discuss any policy changes, and discuss what things a deployer needs to
think about when defining their policy.
Note that the schema should be defined as restrictively as
possible. Parameters which are required should be marked as such and
only under exceptional circumstances should additional parameters
which are not defined in the schema be permitted (e.g.,
additionalProperties should be False).
Reuse of existing predefined parameter types such as regexps for
passwords and user defined names is highly encouraged.
Security impact
---------------
Describe any potential security impact on the system. Some of the items to
consider include:
* Does this change touch sensitive data such as tokens, keys, or user data?
* Does this change alter the API in a way that may impact security, such as
a new way to access sensitive information or a new way to login?
* Does this change involve cryptography or hashing?
* Does this change require the use of sudo or any elevated privileges?
* Does this change involve using or parsing user-provided data? This could
be directly at the API level or indirectly such as changes to a cache layer.
* Can this change enable a resource exhaustion attack, such as allowing a
single API interaction to consume significant server resources? Some examples
of this include launching subprocesses for each connection, or entity
expansion attacks in XML.
For more detailed guidance, please see the OpenStack Security Guidelines as
a reference (https://wiki.openstack.org/wiki/Security/Guidelines). These
guidelines are a work in progress and are designed to help you identify
security best practices. For further information, feel free to reach out
to the OpenStack Security Group at openstack-security@lists.openstack.org.
Notifications impact
--------------------
Please specify any changes to notifications. Be that an extra notification,
changes to an existing notification, or removing a notification.
Other end user impact
---------------------
Aside from the API, are there other ways a user will interact with this
feature?
* Does this change have an impact on python-watcherclient? What does the user
interface there look like?
Performance Impact
------------------
Describe any potential performance impact on the system, for example
how often will new code be called, and is there a major change to the calling
pattern of existing code.
Examples of things to consider here include:
* A periodic task might look like a small addition but if it calls conductor or
another service the load is multiplied by the number of nodes in the system.
* Scheduler filters get called once per host for every instance being created,
so any latency they introduce is linear with the size of the system.
* A small change in a utility function or a commonly used decorator can have a
large impacts on performance.
* Calls which result in a database queries (whether direct or via conductor)
can have a profound impact on performance when called in critical sections of
the code.
* Will the change include any locking, and if so what considerations are there
on holding the lock?
Other deployer impact
---------------------
Discuss things that will affect how you deploy and configure OpenStack
that have not already been mentioned, such as:
* What config options are being added? Are the default values ones which will
work well in real deployments?
* Is this a change that takes immediate effect after its merged, or is it
something that has to be explicitly enabled?
* If this change is a new binary, how would it be deployed?
* Please state anything that those doing continuous deployment, or those
upgrading from the previous release, need to be aware of. Also describe
any plans to deprecate configuration values or features. For example, if we
change the directory name that instances are stored in, how do we handle
instance directories created before the change landed? Do we move them? Do
we have a special case in the code? Do we assume that the operator will
recreate all the instances in their cloud?
Developer impact
----------------
Discuss things that will affect other developers working on OpenStack.
Implementation
==============
Assignee(s)
-----------
Who is leading the writing of the code? Or is this a blueprint where you're
throwing it out there to see who picks it up?
If more than one person is working on the implementation, please designate the
primary author and contact.
Primary assignee:
<launchpad-id or None>
Other contributors:
<launchpad-id or None>
Work Items
----------
Work items or tasks -- break the feature up into the things that need to be
done to implement it. Those parts might end up being done by different people,
but we're mostly trying to understand the timeline for implementation.
Dependencies
============
* Include specific references to specs and/or blueprints in Watcher, or in
other projects, that this one either depends on or is related to.
* If this requires functionality of another project that is not currently used
by Watcher (such as the glance v2 API when we previously only required v1),
document that fact.
* Does this feature require any new library dependencies or code otherwise not
included in OpenStack? Or does it depend on a specific version of library?
Testing
=======
Please discuss the important scenarios needed to test here, as well as
specific edge cases we should be ensuring work correctly. For each
scenario please specify if this requires specialized hardware, a full
openstack environment, or can be simulated inside the Watcher tree.
Please discuss how the change will be tested. We especially want to know what
tempest tests will be added. It is assumed that unit test coverage will be
added so that doesn't need to be mentioned explicitly, but discussion of why
you think unit tests are sufficient and we don't need to add more tempest
tests would need to be included.
Is this untestable in gate given current limitations (specific hardware /
software configurations available)? If so, are there mitigation plans (3rd
party testing, gate enhancements, etc).
Documentation Impact
====================
What is the impact on the docs team of this change? Some changes might require
donating resources to the docs team to have the documentation updated. Don't
repeat details discussed above, but please reference them here.
References
==========
Please add any useful references here. You are not required to have any
reference. Moreover, this specification should still make sense when your
references are unavailable. Examples of what you could include are:
* Links to mailing list or IRC discussions
* Links to notes from a summit session
* Links to relevant research, if appropriate
* Related specifications as appropriate (e.g. if it's an EC2 thing, link the
EC2 docs)
* Anything else you feel it is worthwhile to refer to
History
=======
Optional section for liberty intended to be used each time the spec
is updated to describe new design, API or any database schema
updated. Useful to let reader understand what's happened along the
time.
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Caracal
- Introduced

View File

@ -0,0 +1,147 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
============
MAAS support
============
https://blueprints.launchpad.net/watcher/+spec/maas-support
This blueprint aims to introduce Watcher support for MAAS, another bare metal
provisioning and management service that's commonly used with Openstack.
Problem description
===================
Metal-As-A-Service (MAAS) is an open source project led by Canonical that
allows provisioning and managing bare metal nodes.
Right now, Watcher can only use Ironic, however MAAS support can be added with
minimal changes.
Use Cases
----------
Some Openstack clusters are deployed using MAAS + Juju instead of Ironic.
By adding MAAS support, we'll allow Watcher to discover MAAS nodes and perform
power actions, adjusting the number of running nodes based on the current
workload.
Proposed change
===============
We'll add a simple bare metal client abstraction with concrete implementations
for Ironic and MAAS.
If a MAAS endpoint and credentials are provided, we'll pick the MAAS client,
otherwise defaulting to the Ironic client.
The python-libmaas client will be used to interact with the MAAS service.
Alternatives
------------
None
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
A MAAS authentication key will have to be provided through a config option.
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
The MAAS URL and authentication key will have to be configured when using
MAAS.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
<petrutlucian94>
Work Items
----------
* Add new Watcher config options
* Add metal client abstraction
* Provide proper test coverage
* Update Juju Watcher charm, exposing the new config options
Dependencies
============
None
Testing
=======
Unit tests will be provided for the newly added code.
Power cycle operations are disruptive and can affect other tests, which
is probably the reason why there are no existing functional or integration
tests for the "energy saving" strategy. Such tests would excercise the MAAS
client as well.
Documentation Impact
====================
The new MAAS related config options will have to be documented. Also, some
Ironic references may need to be updated, reflecting the fact that Watcher
can now use more than one bare metal management service.
References
==========
* https://maas.io
* https://git.launchpad.net/maas/
* https://github.com/maas/python-libmaas
* https://github.com/openstack/charm-watcher
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Caracal
- Introduced

View File

@ -53,7 +53,7 @@ architecture).
Below you will find a diagram, showing the functional need regarding Actions
management in Watcher:
.. image:: ../../../doc/source/images/Watcher_Actions_Management_Functional_Need.png
.. image:: /images/Watcher_Actions_Management_Functional_Need.png
:width: 140%
You can see that there is a need in Watcher for three main phases:

View File

@ -92,7 +92,7 @@ from its internal cache.
Here below is a sequence diagram depicting the workflow to be used in order to
retrieve all the cluster data model:
.. image:: ../../../doc/source/images/sequence_diagram_cluster_objects_wrapper_get_latest_model.png
.. image:: /images/sequence_diagram_cluster_objects_wrapper_get_latest_model.png
:width: 140%
Each implementation of the ``BaseClusterModelCollector`` should begin
@ -109,7 +109,7 @@ sensible default is likely in the every 60 minute range.
Here below is a sequence diagram depicting the workflow to periodically
synchronize all the cluster data models:
.. image:: ../../../doc/source/images/sequence_diagram_cluster_objects_wrapper_sync.png
.. image:: /images/sequence_diagram_cluster_objects_wrapper_sync.png
:width: 140%
If the periodic sync up tasks are the only method of updating the cache,
@ -136,7 +136,7 @@ handler would be able to receive notifications such as:
Here below is a sequence diagram depicting the workflow to update cluster data
models after receiving a notification:
.. image:: ../../../doc/source/images/sequence_diagram_cluster_objects_wrapper_notification.png
.. image:: /images/sequence_diagram_cluster_objects_wrapper_notification.png
:width: 140%
Note that a single notification will not prompt the entire cluster model to be

View File

@ -155,7 +155,7 @@ stored metrics, ...).
Below you will find a class diagram showing the hierarchy of `Strategies`_ for
several goals and how they are related to efficacy specification classes:
.. image:: ../../../doc/source/images/class_diagram_efficacy_indicator.png
.. image:: /images/class_diagram_efficacy_indicator.png
:width: 140%
In the future, the `DDD Specification Pattern`_ will enable Watcher to compose

View File

@ -68,7 +68,7 @@ class and return the same `Goal`_ properties.
Below you will find a class diagram showing a hierarchy of Strategies for
several goals:
.. image:: ../../../doc/source/images/class_diagram_goal_from_strategy.png
.. image:: /images/class_diagram_goal_from_strategy.png
:width: 140%
In the future, it will also enable Watcher strategies to provide other common
@ -101,7 +101,7 @@ The second one is performance and HA.
Below the strategy class and sequence diagram for syncing the goals.
.. image:: ../../../doc/source/images/get_goal_from_strategy_class_diagram.png
.. image:: /images/get_goal_from_strategy_class_diagram.png
:width: 140%
@ -121,7 +121,7 @@ Therefore a new table should be created in the database for this.
The proposed modification in the `Watcher database`_.
is illustrated on the diagram below:
.. image:: ../../../doc/source/images/get_goal_from_strategy_class_diagram.png
.. image:: /images/get_goal_from_strategy_class_diagram.png
:width: 140%
In the audit_template object, the 'strategy' attribute is optional.

View File

@ -87,7 +87,7 @@ Proposed change
Here below is the class diagram outlining the changes that will have to be made
in order to support the addition of configuration options:
.. image:: ../../../doc/source/images/class_diagram_plugin_parameters.png
.. image:: /images/class_diagram_plugin_parameters.png
:width: 100%
Moreover, all plugins are currently instantiated by the ``DefaultLoader`` when
@ -97,7 +97,7 @@ abstract class method that every plugin class should implement. The latter
method needs to be an class method so that when Watcher will collect the
configuration of each plugin, there will be no need to instantiate them.
.. image:: ../../../doc/source/images/sequence_diagram_plugin_parameters_load_plugin_parameters.png
.. image:: /images/sequence_diagram_plugin_parameters_load_plugin_parameters.png
:width: 100%
In order to expose these plugin parameters to the administrator, we also have
@ -105,7 +105,7 @@ to auto-discover them when we use the configuration file generator which is
triggered either during the generation of the Watcher documentation or manually
with the ``tox -e config`` command:
.. image:: ../../../doc/source/images/sequence_diagram_plugin_parameters_generate_config.png
.. image:: /images/sequence_diagram_plugin_parameters_generate_config.png
:width: 100%
In order to be able to achieve the process described in the above sequence

View File

@ -89,7 +89,7 @@ Usage scenarios
The most basic scenario is presented on the following diagram:
.. image:: ../../../doc/source/images/scoring-engine-inside-decision-engine.png
.. image:: /images/scoring-engine-inside-decision-engine.png
It's important to notice that Scoring Engines might have different
requirements and the implementations might vary. Some of them might be
@ -98,7 +98,7 @@ In these cases it makes a sense to delegate the execution to Watcher Scoring
module, which will be a new Watcher service, similar to `Watcher Decision
Engine`_ or `Watcher Applier`_:
.. image:: ../../../doc/source/images/scoring-engine-inside-scoring-module.png
.. image:: /images/scoring-engine-inside-scoring-module.png
Some other Scoring Engines might be implemented using external frameworks or
even live entirely in the cloud, exposing only some API to work with them.
@ -106,7 +106,7 @@ In this scenario, the abstraction layer will simply delegate work to these
external systems (e.g. using some HTTP client libraries), as illustrated on
the diagram below:
.. image:: ../../../doc/source/images/scoring-engine-in-the-cloud.png
.. image:: /images/scoring-engine-in-the-cloud.png
Implementation details
----------------------
@ -228,7 +228,7 @@ will be required to implement, whether the Watcher Scoring module part will be
optional (it's not needed for example when using external analytics platforms
running in the cloud).
.. image:: ../../../doc/source/images/scoring-module-deployment.png
.. image:: /images/scoring-module-deployment.png
In addition, it will be possible to register multiple Scoring Engines from a
single plug-in. The Scoring Engine list will also be dynamic, meaning that it

View File

@ -10,7 +10,7 @@ usedevelop = True
setenv = VIRTUAL_ENV={envdir}
deps = -c{env:UPPER_CONSTRAINTS_FILE:https://releases.openstack.org/constraints/upper/master}
-r{toxinidir}/test-requirements.txt
whitelist_externals = find
allowlist_externals = find
commands =
find . -type f -name "*.pyc" -delete
stestr run --slowest {posargs}
@ -30,7 +30,7 @@ commands =
[testenv:pdf-docs]
envdir = {toxworkdir}/docs
deps = {[testenv:docs]deps}
whitelist_externals =
allowlist_externals =
rm
make
commands =