This change adds an artifact push task to deployments, which helps
support operators to ensure they've a better overall user experience
without needing to deploy an http server or running Swift on the
undercloud.
Depends-On: I5d18cf334c1bc4011db968fbeb4f9e41869611cd
Change-Id: I7bef7c4c7613a2475784dde135d71232b412d79f
Signed-off-by: Kevin Carter <kecarter@redhat.com>
This patch exposes the net_cidr_map variable so that tasks can
access the list of CIDRs that are valid for a network as opposed
to attempting to build the CIDRs from the network definitions.
In spine-leaf or edge use cases the networks may have multiple
subnets assigned to a given network.
The new Unbound service will use these maps to build lists of
CIDRs allowed to make queries.
Change-Id: I6004519e8b2317d19356c4a2b8bea416b4d94c22
In order to ANSIBLE_INJECT_FACT_VARS=False we have to use ansible_facts
instead of ansible_* vars. This change switches our distribution and
hostname related items to use ansible_facts instead.
Change-Id: I49a2c42dcbb74671834f312798367f411c819813
Related-Bug: #1915761
Import tasks causes the tasks always to be pulled in and just skipped at
run time. This is terribly slow with more roles even when not running
against those hosts. A similar effort was applied to the update process
I2eab008ca27546acbd2b1275f07bcca0b84b858c which should also be used
here.
Change-Id: Ibd9bb9f8a4c6a7ce3c6ebd11ce5cf444dde57c33
Related-Bug: #1915761
This change restores the PreNetworkConfig resources, so that we migrate
back ExtraCnfigPre and NodeExtraConfig from pre network configurations
to post network configurations, to be consistent with older version
depending on Heat software deployments instead of config download.
Depends-on: https://review.opendev.org/772303
Closes-Bug: #1907214
Change-Id: I96e7e4c570839cfba6011788464d8e93925b2f01
The "Overcloud common bootstrap tasks for step 1" setup pulls in the
tasks from the following file:
common_deploy_steps_tasks_step_1: {get_file: deploy-steps-tasks-step-1.yaml}
Now in that file we mainly do the following tasks:
- Set up the /var/lib/{tripleo-config,container-puppet,kolla} folders
- Write the container config json files
- Set up some puppet folders
- Write puppet step_config manifestes
Not only it makes sense to have these preparation steps before the
deployment steps, even at step1, but it also is strictly needed for
the Frr/Bgp service. The reason for that is that Frr runs containerized
and needs to start at deploy_step 1, so that traffic to the other bgp nodes
is working before step_config step 1 which contains the puppet
invocation to set up the cluster.
In particular the frr/bgp container needs to be started during
deployment step1 and to do so we need the kolla files and the other
container startup files to be set up before we invoke podman.
I could not figure out from the git history as to why this was not done
in the first place, it seems to not have been done on purpose.
So I did some extra testing to make sure nothing got broken by this:
1) Tested a composable control plane largish env with this patch only (train)
2) Tested a minor updated process on (1)
3) Tested a redeploy on (1)
4) Tested an FFU upgrade from queens to train with this change applied
(3xctrl + 2xcmp)
5) Tested a BGP deployment spread over 3 racks in a spine/leaf
configuration (~a dozen of deployments)
Change-Id: I0e6594bfd1ff2e27bb4917c157f163643a811ca6
When we moved the network deployment to free, we can hit a race
condition where some nodes are being configured prior to the controllers
being ready. This can lead to the basic network validation failing. We
can deal with this by moving the validation to it's own play so that we
ensure all network configurations have occurred prior to to checking
that nodes can ping the controllers.
Change-Id: I48665379a87e701633be1bcc90bd8be1cc75c513
Closes-Bug: #1913725
With this new switch we can opt-out enforcement of the subscription
check for some composed role. This is mainly useful for composed Ceph
which have different constraint than other Openstack roles.
Closes-Bug: #1912512
Depends-On: https://review.opendev.org/c/openstack/tripleo-ansible/+/771671
Change-Id: I46529ccab6c197da4885950282eb6731e28573d6
Currently, multiple scripts are being stored in
/var/lib/container-config-scripts directory if any of theses scripts
are being used during the update_tasks the content won't be up to
date (or the script will be fully missing if added in a new release),
as the content of this folder is being updated during the deploy tasks
(step 1), which occurs after all the update_tasks.
This patch gathers the tasks responsible for the folder creation and
content update into a new playbook named common_container_config_scripts.yaml,
this way we can reference the tasks from the deploy-tasks step1 playbook
(as it was happening up to now) and invoke them before the update_tasks
playbook gets called.
Change-Id: I2ac6bb98e1d4183327e888240fc8d5a70e0d6fcb
Related-Bug: #1904193
The tripleo_free startegy cuts the run time of composable upgrade.
But as each node gets to different point of code there is need to
remove clear_facts part from tripleo-packages. The clear_facts
applies globaly meaning if messaging node looks for distribution
fact while controller runs clear_facts we will fail. The
clear_facts was added to force ansible to reset the
ansible_python_interpreter. This should be avoided by simply
setting default python by alternatives command and it's irelevant
to any other version than stable/train.
Change-Id: I556327228f23eb5e744b580618fb581a9fd2ce41
This replaces net-config-noop.yaml mappings to OS::Heat::None.
Also removes all unnecessary setting of it in environments as
we map them in overcloud-resource-registry-puppet.j2.yaml.
Normally that should be enough but we override them in so many
places, so there will be some redundancy.
Depends-On: https://review.opendev.org/755275
Change-Id: Ib4d07c835568cb3072770f81a082b5a5e1c790ea
This uses the new ansible module for network configuration
on the nodes. Aso, converts the net-config-multinode.yaml to
use os-net-config.
Next patch in this series would change the NetworkConfig
resource type to OS::Heat::Value and drop run-os-net-config.sh.
Depends-On: https://review.opendev.org/748754
Change-Id: Ie48da5cfffe21eee6060a6d22045d09524283138
We've long supported the ability to deploy rpms or tar.gz to a system
during deployment. It currently is a shell script so let's convert it to
an ansible module.
Change-Id: Id30f89cf261b356f25a93e5df5550e9dfb08f808
Depends-On: https://review.opendev.org/#/c/748757/
The old all nodes validation used a bash script to run some basic ping
tests after the network setup. It used to be a software config but
eventually got baked into the deployment framework. This patch switches
to the ansible role implementation and cleans up the old references to
the old heat resource.
Change-Id: Ia7f055d2c636f950c3fe6d8611834c4ab290f31a
Depends-On: https://review.opendev.org/#/c/747466/
https://review.opendev.org/#/c/749432/ moved network configuration
before the deployments.yaml execution. The deployment.yaml included
legacy executions of deployments (e.g. *PreConfig ExtraConfigPre) which
may be necessary prior to running the network configuration. We shouldn't
change up the ordering of executions where the things being executed may
be dynamic.
Change-Id: I496005ef7f3e75382d2bb20f7c6dc9ed96762695
This seprates the overcloud network configuration as a separate
play and adds a new tag 'network_deploy_steps' so that it can be
executed alone using '--tags network_deploy_steps'.
Change-Id: I96b0d838e79bcaa8b08ffaa2fb745ee7003d1284
Instead of running a giant playbook of external_deploy_tasks, we have
now one playbook per step and per role; which will run individually to
avoid a lot of skipped tasks at each step, and save time during a
deployment or day 2 operation.
Co-Authored-By: Kevin Carter <kecarter@redhat.com>
Change-Id: Iaecd22bc16d1180e2ab8aee5efb8c75ab15f42e5
Move the NetworkConfig tasks to a role maanged in tripleo-ansible.
Change-Id: Ia3fbad9606b18b863a1a81cfe23ff483411d4796
Depends-On: https://review.opendev.org/#/c/744260/
During update we skip a lot tasks because we loop over the same update
step task file, changing the step variable, wasting time and
clobbering logs.
To solve the skipped tasks this, we now loop over the stepX file
generated for update and post update.
Expanding the playbook to import the tasks and setting the step
variable has another benefit. It opens the possibility to use
"start-at-tasks" as everything is imported. Using loop variable and
include_tasks prevented its usage.
Depends-On: https://review.opendev.org/740465
Change-Id: Ib32791d72410766a0b2e2330713120d15dc35f57
Now that the FFU process relies on the upgrade_tasks and deployment
tasts there is no need to keep the old fast_forward_upgrade_tasks.
This patch removes all the fast_forward_upgrade_tasks section from
the services, as well as from the common structures.
Change-Id: I39b8a846145fdc2fb3d0f6853df541c773ee455e
Currently this task definition is causing the following warning to be
thrown at execution time:
[WARNING]: conditional statements should not include jinja2 templating
delimiters such as {{ }} or {% %}. Found: '{{ playbook_dir }}/{{
_task_file_path }}' is exists
There is no reason to be using this conditional with a variable since
this file path is known when we create the playbook. Additionally we
don't need to use the playbook_dir because we're not using it in the
include_tasks. This allows us to specifically express the file that
we're using for these tasks inclusions rather than relying on a
conditional that is not recommended.
Change-Id: I368e99d2384469ca166e2e2b00e8621b7afe72db
We switched to our free version which does work with any_errors_fatal
now so this comment is no longer relevant.
Change-Id: Ifad07edb4496b0e3ea5b34aee3933ece4bb2dfc6
Tolerate failures on all the plays that run on the overcloud and which
potentially run with facts gathering only (no other tasks).
Since they can't apply a strategy, we want to let them fail without
error. In a later play, we have our tripleo strategies which will figure
out if we have reached the maximum of failures that was configured with
MaxFailPercentage.
Change-Id: I43652972494a9ed1f2b8b483fe19e685cbe4da1b
In order to handle percentage failures for tripleo classes we will need
to use a custom strategy to handle the failures. Let's explicitly define
our expected strategies in the playbook so it'll default to linear (the
ansible default) on the host when ansible is invoked but our deployment
tasks will be run as we want them.
Depends-On: https://review.opendev.org/#/c/724766/
Change-Id: Ifaddeccfbb82e03815311ee0b76eb2e2eae282a7
Remove container_startup_configs_tasks.yaml playbook, and some set_fact
tasks that aren't needed anymore; since we now have a module to generate
container startup config files.
Change-Id: I8ebe7f5f52f14a14c3911748b2e2063e0c3ad9ac
The tripleo_free strategy should allow the tasks to run freely for a
given playbook that defines using the tripleo_free strategy. The defaul
strategy is a linear one that will execute each task across all servers
prior to moving to the next task. The tripleo_free strategy will execute
the tasks on servers without syncryonizing the tasks within a given
playbook. Because TripleO uses step concepts in our deployment, we
already have the syncronization points in the main playbook. The outer
playbook should be done linearly but the deployment steps themselves
should be done freely.
The tripleo_free playbook won't stop execution on all hosts if one host
fails or becomes unreachable. It will however end the play exeuction if
any error occurs on any host. This is similar to the deployment failures
we used to have with Heat where a failure on any single node would stop
the deployment at a given deployment step. A future improvement of this
will be to add logic to handle a failure percentage on a given TripleO
role to only stop the playbook if the failure percentage exceeds a
defined amount. Currently any failure will stop a playbook but may not
stop later tasks from executing on the rest of the hosts. We will likely
need to implement a tripleo_linear strategy based on the upstream linear
strategy to understand these failure percentages as well.
NOTE: During the testing of this, we identified two issues with the free
strategy in ansible itself. We will need those fixes landed in the
version of ansible prior to being able to land this.
Depends-On: https://github.com/ansible/ansible/pull/69730
Depends-On: https://github.com/ansible/ansible/pull/69524
Change-Id: Ib4a02a192377aafab5970647d74977cb1189bcae
This change enabled become: true to the deploy step and host prep task
execution. external tasks are still become: false as they are delegated
to localhost and run as the same user running the deployment.
Change-Id: I79631ce0ed450febae96db2f32198e02eb427d91
Related-Bug: #1883609
Once again we're splitting the deployment steps into multiple plays
which causes ansible to artificially stop at the end of each play when
using a free-style strategy. By combining the deploy steps, bootstrap
tasks and common deploy steps, we can run each step to completion in
parallel rather than stopping at each of these for every step. This will
reduce the overall play count to (1*6) instead of (3*6+1)
Change-Id: I986391617231b9dd23aa487d73db490db84f61ce
When executing the deployment_steps_playbook.yaml passing
a specific --tags, it looks like this play gets executed
even though the tag passed in --tags isn't common_roles.
This patch converts the roles: structure into a:
tasks:
- include_role:
which is the prefered way since Ansible 2.4. This way, the
tags work properly and the execution of both roles is skipped
if the tag doesn't match common_roles.
Change-Id: I772ad486ca11525b8756a0b8cac7a5345373a5d3
Closes-Bug: #1885721
Since plays cannot be parallelized with a free-like strategy, we should
have a single play that runs all host prep tasks rather than blocks of
plays for each defined role. By switching to this method we have a
single play that handles the host prep tasks concept that can be run all
at once across a cloud.
Change-Id: I2670840c9beac06f8b8ffc6eaecc88b9119ad298
The scale down actions appear to fail under newer versions of ansible if
the nodes are unavailable. In the logs we see:
[WARNING]: Found variable using reserved name: ignore_unreachable
This change renames our variable to not collide with ignore_unreachable
Change-Id: Ida54f59fc1415122241493c02a0fc764d09ae6c1
We already do this in the time configuration and it is no longer a
configurable item due to containers. Let's stop doing this in the all
nodes validation and let the time configuration handle the failure
messaging.
Change-Id: Ib0abcbd25117ecd587f4a92698746e5e256e6e8e
Paunch was deprecated in Ussuri and is now being retired, to be fully
replaced by the new tripleo-ansible role, tripleo_container_manage.
This patch:
- Removes common/container-puppet.py (was only useful when paunch is
enabled, since that script was converted to container_puppet_config
Ansible module in tripleo-ansible).
- Update all comments refering to paunch, and replace by
tripleo_container_manage.
- Deprecate EnablePaunch parameter.
- Remove paunch as python dependencies.
Depends-On: https://review.opendev.org/#/c/731545/
Change-Id: I9294677fa18a7efc61898a25103414c8191d8805