Decouple TripleO tasks

Signed-off-by: James Slagle <jslagle@redhat.com> Change-Id: I408cfd1173edafafe576a99d3d8b02cc83fce742
2022-03-22 14:55:29 -04:00 · 2022-03-22 14:55:29 -04:00 · 59e352f1a9
parent 2b55569237
commit 59e352f1a9
1 changed files with 253 additions and 0 deletions
--- a/specs/zed/decouple-tripleo-tasks.rst
+++ b/specs/zed/decouple-tripleo-tasks.rst
@ -0,0 +1,253 @@
 ..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.
 http://creativecommons.org/licenses/by/3.0/legalcode
 ======================
 Decouple TripleO Tasks
 ======================
 https://blueprints.launchpad.net/tripleo/+spec/decouple-tripleo-tasks
 This spec proposes decoupling tasks across TripleO by organizing tasks in a way
 that they are grouped as a function of what they manage. The desire is to be
 able to better isolate and minimize what tasks need to be run for specific
 management operations. The process of decoupling tasks is implemented through
 moving tasks into standalone native ansible roles and playbooks in tripleo-ansible.
 Problem Description
 ===================
 TripleO presently manages the entire software configuration of the overcloud at
 once each time ``openstack overcloud deploy`` is executed. Regardless of
 whether nodes were already deployed, require a full redeploy for some reason,
 or are new nodes (scale up, replacement) all tasks are executed. The
 functionality of only executing needed tasks lies within Ansible.
 The problem with relying entirely on Ansible to determine if any changes are
 needed is that it results in long deploy times. Even if nothing needs to be
 done, it can take hours just to have Ansible check each task in order to make
 that determination.
 Additionally, TripleO's reliance on external tooling (Puppet, container config
 scripts, bootstrap scripts, etc) means that tasks executing those tools
 **must** be executed by Ansible as Ansible does not have the necessary data
 needed in order to determine if those tasks need to be executed or not. These
 tasks often have cascading effects in determining what other tasks need to be
 run. This is a general problem across TripleO, and is why the model of just
 executing all tasks on each deploy has been the accepted pattern.
 Proposed Change
 ===============
 The spec proposes decoupling tasks and separating them out as needed to manage
 different functionality within TripleO. Depending on the desired management
 operation, tripleoclient will contain the necessary functionality to trigger
 the right tasks. Decoupling and refactoring tasks will be done by migrating to
 standalone ansible role and playbooks within tripleo-ansible. This will allow
 for reusing the standalone ansible artifacts from tripleo-ansible to be used
 natively with just ``ansible-playbook``. At the same time, the
 ``tripleo-heat-templates`` interfaces are maintained by consuming the new roles
 and playbooks from ``tripleo-ansible``.
 Overview
 --------
 There are 3 main changes proposed to implement this spec:
 #. Refactor ansible tasks from ``tripleo-heat-templates`` into standalone roles
   in tripleo-ansible.
 #. Develop standalone playbooks within tripleo-ansible to consume the
   tripleo-ansible roles.
 #. Update tripleo-heat-templates to use the standalone roles and playbooks from
   ``tripleo-ansible`` with new ``role_data`` interfaces to drive specific
   functionality with new ``openstack overcloud`` commands.
 Writing standalone roles in ``tripleo-ansible`` will largely be an exercise of
 copy/paste from tasks lists in ``tripleo-heat-templates``. As tasks are moved
 into standalone roles, tripleo-heat-templates can be directly updated to run
 tasks from the those roles using ``include_role``. This pattern is already well
 established in tripleo-heat-templates with composable services that use
 existing standalone roles.
 New playbooks will be developed within tripleo-ansible to drive the standalone
 roles using pure ``ansible-playbook``. These playbooks will offer a native
 ansible experience for deploying with tripleo-ansible.
 The design principles behind the standalone role and playbooks are:
 #. Native execution with ansible-playbook, an inventory, and variable files.
 #. No Heat. While Heat remains part of the TripleO architecture, it has no
   bearing on how the native ansible is developed in tripleo-ansible.
   tripleo-heat-templates can consume the standalone ansible playbooks and
   roles from tripleo-ansible, but it does not dictate the interface. The
   interface should be defined for native ansible best practices.
 #. No puppet. As the standalone roles are developed, they will not rely on
   puppet for configuration or any other tasks. To allow integration with
   tripleo-heat-templates and existing TripleO interfaces (Hiera, Heat
   parameters), the roles will allow skipping config generation and other parts
   that use puppet so that pieces can be overridden by
   ``tripleo-heat-templates`` specific tasks. When using native Ansible,
   templated config files and native ansible tasks will be used instead of
   Puppet.
 #. While the decoupled tasks will allow for cleaner interfaces for executing
   just specific management operations, all tasks will remain idempotent. A
   full deployment that re-runs all tasks will still work, and result in no
   effective changes for an already deployed cloud with the same set of inputs.
 The standalone roles will use separated task files for each decoupled
 management interface exposed. The playbooks will be separated by management
 interface as well to allow for executing just specific management functionality.
 The decoupled management interfaces are defined as:
 * bootstrap
 * install
 * pre-network
 * network
 * configure
 * container-config
 * service-bootstrap
 New task interfaces in ``tripleo-heat-templates`` will be added under
 ``role_data`` to correspond with the new management interfaces, and consume the
 standalone ansible from tripleo-ansible. This will allow executing just
 specific management interfaces and using the standalone playbooks from
 tripleo-ansible directly.
 New subcommands will be added to tripleoclient to trigger the new management
 interface operations, ``openstack overcloud install``, ``openstack overcloud
 configure``, etc.
 ``openstack overcloud deploy`` would continue to function as it presently does
 by doing a full assert of the system state with all tasks. The underlying
 playbook, ``deploy-steps-playbook.yaml`` would be updated as necessary to
 include the other playbooks so that all tasks can be executed.
 Alternatives
 ------------
 :Alternative 1 - Use --tags/--skip-tags:
 With ``--tags`` / ``--skip-tags``, tasks could be selectively executed. In the
 past this has posed other problems within TripleO. Using tags does not allow
 for composing tasks to the level needed, and often results in running tasks
 when not needed or forgetting to tag needed tasks. Having to add the special
 cased ``always`` tag becomes necessary so that certain tasks are run when
 needed. The tags become difficult to maintain as it is not apparent what tasks
 are tagged when looking at the entire execution. Additionally, not all
 operations within TripleO map to Ansible tasks one to one. Container startup
 are declared in a custom YAML format, and that format is then used as input to
 a task. It is not possible to tag individual container startups unless tag
 handling logic was added to the custom modules used for container startup.
 :Alternative 2 - Use --start-at-task:
 Using ``--start-at-task`` is likewise problematic, and it does not truly
 partition the full set of tasks. Tasks would need to be reordered anyway across
 much of TripleO so that ``--start-at-task`` would work. It would be more
 straightforward to separate by playbook if a significant number of tasks need
 to be reordered.
 Security Impact
 ---------------
 Special consideration should be given to security related tasks to ensure that
 the critical tasks are executed when needed.
 Upgrade Impact
 --------------
 Upgrade and update tasks are already separated out into their own playbooks.
 There is an understanding that the full ``deploy_steps_playbook.yaml`` is
 executed after an update or upgrade however. This full set of tasks could end
 up being reduced if tasks are sufficiently decoupled in order to run the
 necessary pieces in isolation (config, bootstrap, etc).
 Other End User Impact
 ---------------------
 Users will need to be aware of the limitations of using the new management
 commands and playbooks. The expectation within TripleO has always been the
 entire state of the system is re-asserted on scale up and configure operations.
 While the ability to still do a full assert would be present, it would no
 longer be required. Operators and users will need to understand that only
 running certain management operations may not fully apply a desired change. If
 only a reconfiguration is done, it may not imply restarting containers. With
 the move to standalone and native ansible components, with less
 ``config-download`` based generation, it should be more obvious what each
 playbooks is responsible for managing. The native ansible interfaces will help
 operators reason about what needs to be run and when.
 Performance Impact
 ------------------
 Performance should be improved for the affected management operations due to
 having to run less tasks, and being able to run only the tasks needed for a
 given operation.
 There should be no impact when running all tasks. Tasks must be refactored in
 such a way that the overall deploy process when all tasks are run is not made
 slower.
 Other Deployer Impact
 ---------------------
 Discuss things that will affect how you deploy and configure OpenStack
 that have not already been mentioned, such as:
 * What config options are being added? Should they be more generic than
  proposed (for example a flag that other hypervisor drivers might want to
  implement as well)? Are the default values ones which will work well in
  real deployments?
 * Is this a change that takes immediate effect after its merged, or is it
  something that has to be explicitly enabled?
 Developer Impact
 ----------------
 TripleO developers will be responsible for updating the service templates that
 they maintain in order to refactor the tasks.
 Implementation
 ==============
 Assignee(s)
 -----------
 Primary assignee:
  James Slagle <jslagle@redhat.com>
 Work Items
 ----------
 Work items or tasks -- break the feature up into the things that need to be
 done to implement it. Those parts might end up being done by different people,
 but we're mostly trying to understand the timeline for implementation.
 Dependencies
 ============
 None.
 Testing
 =======
 Existing CI jobs would cover changes to task refactorings.
 New CI jobs could be added for the new isolated management operations.
 Documentation Impact
 ====================
 New commands and playbooks must be documented.
 References
 ==========
 `standalone-roles POC <https://review.opendev.org/q/topic:standalone-roles>`_