Decouple TripleO tasks

Signed-off-by: James Slagle <jslagle@redhat.com>
Change-Id: I408cfd1173edafafe576a99d3d8b02cc83fce742
This commit is contained in:
James Slagle 2022-03-22 14:55:29 -04:00
parent 2b55569237
commit 59e352f1a9
1 changed files with 253 additions and 0 deletions

View File

@ -0,0 +1,253 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
======================
Decouple TripleO Tasks
======================
https://blueprints.launchpad.net/tripleo/+spec/decouple-tripleo-tasks
This spec proposes decoupling tasks across TripleO by organizing tasks in a way
that they are grouped as a function of what they manage. The desire is to be
able to better isolate and minimize what tasks need to be run for specific
management operations. The process of decoupling tasks is implemented through
moving tasks into standalone native ansible roles and playbooks in tripleo-ansible.
Problem Description
===================
TripleO presently manages the entire software configuration of the overcloud at
once each time ``openstack overcloud deploy`` is executed. Regardless of
whether nodes were already deployed, require a full redeploy for some reason,
or are new nodes (scale up, replacement) all tasks are executed. The
functionality of only executing needed tasks lies within Ansible.
The problem with relying entirely on Ansible to determine if any changes are
needed is that it results in long deploy times. Even if nothing needs to be
done, it can take hours just to have Ansible check each task in order to make
that determination.
Additionally, TripleO's reliance on external tooling (Puppet, container config
scripts, bootstrap scripts, etc) means that tasks executing those tools
**must** be executed by Ansible as Ansible does not have the necessary data
needed in order to determine if those tasks need to be executed or not. These
tasks often have cascading effects in determining what other tasks need to be
run. This is a general problem across TripleO, and is why the model of just
executing all tasks on each deploy has been the accepted pattern.
Proposed Change
===============
The spec proposes decoupling tasks and separating them out as needed to manage
different functionality within TripleO. Depending on the desired management
operation, tripleoclient will contain the necessary functionality to trigger
the right tasks. Decoupling and refactoring tasks will be done by migrating to
standalone ansible role and playbooks within tripleo-ansible. This will allow
for reusing the standalone ansible artifacts from tripleo-ansible to be used
natively with just ``ansible-playbook``. At the same time, the
``tripleo-heat-templates`` interfaces are maintained by consuming the new roles
and playbooks from ``tripleo-ansible``.
Overview
--------
There are 3 main changes proposed to implement this spec:
#. Refactor ansible tasks from ``tripleo-heat-templates`` into standalone roles
in tripleo-ansible.
#. Develop standalone playbooks within tripleo-ansible to consume the
tripleo-ansible roles.
#. Update tripleo-heat-templates to use the standalone roles and playbooks from
``tripleo-ansible`` with new ``role_data`` interfaces to drive specific
functionality with new ``openstack overcloud`` commands.
Writing standalone roles in ``tripleo-ansible`` will largely be an exercise of
copy/paste from tasks lists in ``tripleo-heat-templates``. As tasks are moved
into standalone roles, tripleo-heat-templates can be directly updated to run
tasks from the those roles using ``include_role``. This pattern is already well
established in tripleo-heat-templates with composable services that use
existing standalone roles.
New playbooks will be developed within tripleo-ansible to drive the standalone
roles using pure ``ansible-playbook``. These playbooks will offer a native
ansible experience for deploying with tripleo-ansible.
The design principles behind the standalone role and playbooks are:
#. Native execution with ansible-playbook, an inventory, and variable files.
#. No Heat. While Heat remains part of the TripleO architecture, it has no
bearing on how the native ansible is developed in tripleo-ansible.
tripleo-heat-templates can consume the standalone ansible playbooks and
roles from tripleo-ansible, but it does not dictate the interface. The
interface should be defined for native ansible best practices.
#. No puppet. As the standalone roles are developed, they will not rely on
puppet for configuration or any other tasks. To allow integration with
tripleo-heat-templates and existing TripleO interfaces (Hiera, Heat
parameters), the roles will allow skipping config generation and other parts
that use puppet so that pieces can be overridden by
``tripleo-heat-templates`` specific tasks. When using native Ansible,
templated config files and native ansible tasks will be used instead of
Puppet.
#. While the decoupled tasks will allow for cleaner interfaces for executing
just specific management operations, all tasks will remain idempotent. A
full deployment that re-runs all tasks will still work, and result in no
effective changes for an already deployed cloud with the same set of inputs.
The standalone roles will use separated task files for each decoupled
management interface exposed. The playbooks will be separated by management
interface as well to allow for executing just specific management functionality.
The decoupled management interfaces are defined as:
* bootstrap
* install
* pre-network
* network
* configure
* container-config
* service-bootstrap
New task interfaces in ``tripleo-heat-templates`` will be added under
``role_data`` to correspond with the new management interfaces, and consume the
standalone ansible from tripleo-ansible. This will allow executing just
specific management interfaces and using the standalone playbooks from
tripleo-ansible directly.
New subcommands will be added to tripleoclient to trigger the new management
interface operations, ``openstack overcloud install``, ``openstack overcloud
configure``, etc.
``openstack overcloud deploy`` would continue to function as it presently does
by doing a full assert of the system state with all tasks. The underlying
playbook, ``deploy-steps-playbook.yaml`` would be updated as necessary to
include the other playbooks so that all tasks can be executed.
Alternatives
------------
:Alternative 1 - Use --tags/--skip-tags:
With ``--tags`` / ``--skip-tags``, tasks could be selectively executed. In the
past this has posed other problems within TripleO. Using tags does not allow
for composing tasks to the level needed, and often results in running tasks
when not needed or forgetting to tag needed tasks. Having to add the special
cased ``always`` tag becomes necessary so that certain tasks are run when
needed. The tags become difficult to maintain as it is not apparent what tasks
are tagged when looking at the entire execution. Additionally, not all
operations within TripleO map to Ansible tasks one to one. Container startup
are declared in a custom YAML format, and that format is then used as input to
a task. It is not possible to tag individual container startups unless tag
handling logic was added to the custom modules used for container startup.
:Alternative 2 - Use --start-at-task:
Using ``--start-at-task`` is likewise problematic, and it does not truly
partition the full set of tasks. Tasks would need to be reordered anyway across
much of TripleO so that ``--start-at-task`` would work. It would be more
straightforward to separate by playbook if a significant number of tasks need
to be reordered.
Security Impact
---------------
Special consideration should be given to security related tasks to ensure that
the critical tasks are executed when needed.
Upgrade Impact
--------------
Upgrade and update tasks are already separated out into their own playbooks.
There is an understanding that the full ``deploy_steps_playbook.yaml`` is
executed after an update or upgrade however. This full set of tasks could end
up being reduced if tasks are sufficiently decoupled in order to run the
necessary pieces in isolation (config, bootstrap, etc).
Other End User Impact
---------------------
Users will need to be aware of the limitations of using the new management
commands and playbooks. The expectation within TripleO has always been the
entire state of the system is re-asserted on scale up and configure operations.
While the ability to still do a full assert would be present, it would no
longer be required. Operators and users will need to understand that only
running certain management operations may not fully apply a desired change. If
only a reconfiguration is done, it may not imply restarting containers. With
the move to standalone and native ansible components, with less
``config-download`` based generation, it should be more obvious what each
playbooks is responsible for managing. The native ansible interfaces will help
operators reason about what needs to be run and when.
Performance Impact
------------------
Performance should be improved for the affected management operations due to
having to run less tasks, and being able to run only the tasks needed for a
given operation.
There should be no impact when running all tasks. Tasks must be refactored in
such a way that the overall deploy process when all tasks are run is not made
slower.
Other Deployer Impact
---------------------
Discuss things that will affect how you deploy and configure OpenStack
that have not already been mentioned, such as:
* What config options are being added? Should they be more generic than
proposed (for example a flag that other hypervisor drivers might want to
implement as well)? Are the default values ones which will work well in
real deployments?
* Is this a change that takes immediate effect after its merged, or is it
something that has to be explicitly enabled?
Developer Impact
----------------
TripleO developers will be responsible for updating the service templates that
they maintain in order to refactor the tasks.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
James Slagle <jslagle@redhat.com>
Work Items
----------
Work items or tasks -- break the feature up into the things that need to be
done to implement it. Those parts might end up being done by different people,
but we're mostly trying to understand the timeline for implementation.
Dependencies
============
None.
Testing
=======
Existing CI jobs would cover changes to task refactorings.
New CI jobs could be added for the new isolated management operations.
Documentation Impact
====================
New commands and playbooks must be documented.
References
==========
`standalone-roles POC <https://review.opendev.org/q/topic:standalone-roles>`_