43 Commits

Author SHA1 Message Date
ramishra
c9991c2e31 Use 'wallaby' heat_template_version
With I57047682cfa82ba6ca4affff54fab5216e9ba51c Heat has added
a new template version for wallaby. This would allow us to use
2-argument variant of the ``if`` function that would allow for
 e.g. conditional definition of resource properties and help
cleanup templates. If only two arguments are passed to ``if``
function, the entire enclosing item is removed when the condition
is false.

Change-Id: I25f981b60c6a66b39919adc38c02a051b6c51269
2021-03-31 17:35:12 +05:30
ramishra
7f195ff9a8 Remove DefaultPasswords interface
This was mainly there as an legacy interface which was
for internal use. Now that we pull the passwords from
the existing environment and don't use it, we can drop
this.

Reduces a number of heat resources.

Change-Id: If83d0f3d72a229d737a45b2fd37507dc11a04649
2021-02-12 11:38:44 +05:30
Oliver Walsh
9d82364de8 Refactor nova db config
It is best to avoid placing db creds on the compute nodes to limit the
exposure if an attacker succeeds in gaining access to the hypervisor
host.

Related patches in puppet-nova remove the credentials from nova.conf
however the current scope of db credential hieradata is all nova tripleo
services - so it will but written to the hieradata keys on compute
nodes.

This patch refactors the nova hieradata structure, splitting the
nova-api/nova database hieradata out into individual templates and
selectively including only where necessary, ensuring we have no db
creds on a compute node (unless it is an all-in-one api+compute node).

Depends-On: I07caa3185427b48e6e7d60965fa3e6157457018c
Change-Id: Ia4a29bdd2cd8e894bcc7c0078cf0f0ab0f97de0a
Closes-bug: #1871482
2020-11-18 12:22:48 +00:00
Jose Luis Franco Arza
d18209799c Run online migration tasks from external_update_tasks too.
The minor update procedure has been suggesting the execution of the
online db migrations step for few releases already, by the execution
of the openstack overcloud external-update run --tags online_upgrade.
However, this command has never executed any task at all as the
online_upgrade tasks belong to the external_upgrade_tasks section.
Instead of changing the interface in all releases by modifying the docs
let's just fix this in the code.

Change-Id: I4453e7c755cccc52bb135a14be1b04fdea53de6b
Closes-Bug: #1900467
2020-10-21 20:48:13 +00:00
Jose Luis Franco Arza
8783ec9c45 Remove ffwd-upgrade leftovers from THT.
Now that the FFU process relies on the upgrade_tasks and deployment
tasts there is no need to keep the old fast_forward_upgrade_tasks.

This patch removes all the fast_forward_upgrade_tasks section from
the services, as well as from the common structures.

Change-Id: I39b8a846145fdc2fb3d0f6853df541c773ee455e
2020-07-23 15:33:25 +00:00
Oliver Walsh
22df3dbcbb Move nova online migrations to nova-conductor
With multi-cell we can have stacks with no nova-api so the nova database
operations must be coupled to nova-conductor, not nova-api.

Change-Id: I27aa62d00b77ca7ff61bb31c6f396f03e148a144
Closes-bug: #1883098
2020-06-11 12:42:30 +01:00
Emilien Macchi
21d1f773c7 healthchecks: check if fact is defined before checking its value
When checking if keystone/nova healthchecks are healthy, make sure the
registered fact is set (which can slip to a further retry if podman
inspect took too much time to execute).

That way, we process the retries without an error like found in the bug
report.

Change-Id: I9f5063c9c3b598afd5bd01447f00a1146a20f4c3
Closes-Bug: #1878063
2020-05-11 13:39:06 -04:00
Emilien Macchi
4ba1c013a7 Re-validate healthcheck work on nova/keystone containers
They were disabled until the native podman healthcheck was integrated in
tripleo-ansible and it finally merged; so we can remove that safeguard
and it should be working.

Change-Id: I03361c33e54f0c8e71b420b144464ccb29a1ca4e
2020-04-27 21:40:42 -04:00
Emilien Macchi
6464efdc4e Migrate inflight validations to native podman healthchecks
The systemd healthchecks are moving away, so we can use the native
podman healthchecks interface.

See I37508cd8243999389f9e17d5ea354529bb042279 for the whole context.

This patch does the following:

- Migrate the healthcheck checks to use podman inspect instead of
  systemd service status.
- Force the tasks to not run, because we first need
  https://review.opendev.org/#/c/720061 to merge

Once https://review.opendev.org/#/c/720061 is merged, we'll remove the
condition workaround and also migrate to unify the way containers are
checked; and use the role in tripleo-validations.

Depends-On: https://review.opendev.org/720283
Change-Id: I7172d81d305ac8939bee5e7f64960b0a9fea8627
2020-04-15 20:23:58 +00:00
Emilien Macchi
38bad5283f Remove all ignore_errors to avoid confusion when debugging
- deploy-steps-tasks-step-1.yaml: Do not ignore errors when dealing
  with check-mode directories. The file module is resilient enough to
  not fail if the path is already absent.

- deploy-steps-tasks.yaml: Replace ignore_errors by another condition,
  "not ansible_check_mode"; this task is not needed in check mode.

- generate-config-tasks.yaml: Replace ignore_errors by another
  condition, "not ansible_check_mode"; this task is not needed in check mode.

- Neutron wrappers: use fail_key: False instead of ignore_errors: True
  if a key can't be found in /etc/passwd.

- All services with service checks: Replace "ignore_errors: true" by
  "failed_when: false". Since we don't care about whether or not the
  task returns 0, let's just make the task never fail. It will only
  improve UX when scrawling logs; no more failure will be shown for
  these tasks.

- Same as above for cibadmin commands, cluster resources show
  commands and keepalived container restart command; and all other shell
  or command or yum modules uses where we just don't care about their potential
  failures.

- Aodh/Gnocchi: Add pipefail so the task isn't support to fail

- tripleo-packages-baremetal-puppet and undercloud-upgrade: check shell
  rc instead of "succeeded", since the task will always succeed.

Change-Id: I0c44db40e1b9a935e7dde115bb0c9affa15c42bf
2020-03-05 09:22:04 -05:00
Zuul
a3916383d3 Merge "Update ffwd-upgrade branch names" 2020-02-01 21:51:45 +00:00
Zuul
a5f1d5c6e2 Merge "Add DeployIdentifier to extra config containers" 2020-01-29 14:44:14 +00:00
Jesse Pretorius (odyssey4me)
2092b1303f Update ffwd-upgrade branch names
The next iteration of fast-forward-upgrade will be
from queens through to train, so we update the names
accordingly.

Change-Id: Ia6d73c33774218b70c1ed7fa9eaad882fde2eefe
2020-01-27 19:42:40 +00:00
Zuul
98f834c923 Merge "Drop NovaEnableNumaLiveMigration" 2020-01-25 06:26:48 +00:00
Brent Eagles
714e1b5d31 Add DeployIdentifier to extra config containers
Certain config containers might need to be replaced and re-run
regardless of whether configuration changes on update and upgrade.
Adding the DeployIdentifier to the env will ensure that they are.

Change-Id: I150212ebac3fed471ffb4e7ed7b6eb6c7af3fad9
Closes-Bug: #1860571
2020-01-22 15:16:12 -03:30
Takashi Kajinami
bc27951ff2 Drop NovaEnableNumaLiveMigration
In nova, enable_numa_live_migration was deprecated in train release,
so remove the corresponding parameter, NovaEnableNUMALiveMigration,
in templates.

Change-Id: I9616b290bf4ee6fefee66efb6924a3fd6699ccae
2020-01-21 22:06:58 +00:00
Kevin Carter
9a2a36437d
Update all roles to use the new role name
Ansible has decided that roles with hypens in them are no longer supported
by not including support for them in collections. This change renames all
the roles we use to the new role name.

Depends-On: Ie899714aca49781ccd240bb259901d76f177d2ae
Change-Id: I4d41b2678a0f340792dd5c601342541ade771c26
Signed-off-by: Kevin Carter <kecarter@redhat.com>
2020-01-20 10:32:23 -06:00
Sagi Shnaidman
016f7c6002 Remove unnecessary slash volume maps
When podman parses such volume map it removes the slash
automatically and shows in inspection volumes w/o slash.
When comparing configurations it turns to be a difference and
it breaks idempotency of containers, causing them to be recreated.

Change-Id: Ifdebecc8c7975b6f5cfefb14b0133be247b7abf0
2019-12-04 20:32:14 +02:00
Jose Luis Franco Arza
4cbae84c75 Get rid of docker removing in post_upgrade tasks.
When upgrading from Rocky to Stein we moved also from using the docker
container engine into Podman. To ensure that every single docker container
was removed after the upgrade a post_upgrade task was added which made
use of the tripleo-docker-rm role that removed the container. In this cycle,
from Stein to Train both the Undercloud and Overcloud work with Podman, so
there is no need to remove any docker container anymore.

This patch removes all the tripleo-docker-rm post-upgrade task and in those
services which only included a single task, the post-upgrade-tasks section
is also erased.

Change-Id: I5c9ab55ec6ff332056a426a76e150ea3c9063c6e
2019-11-12 16:33:38 +01:00
Zuul
21b56ec34a Merge "Revert "Temporaily disable nova inflight healthchecks to unblock the gate"" 2019-10-17 17:07:37 +00:00
Emilien Macchi
81258ae551 Convert container environment from a list to a dict
Moving all the container environments from lists to dicts, so they can
be consumed later by the podman_container ansible module which uses
dict.

Using a dict is also easier to parse, since it doesn't involve "=" for
each item in the environment to export.

Change-Id: I894f339cdf03bc2a93c588f826f738b0b851a3ad
Depends-On: I98c75e03d78885173d829fa850f35c52c625e6bb
2019-10-16 01:29:31 +00:00
Cédric Jeanneret
affbe57a8b Revert "Temporaily disable nova inflight healthchecks to unblock the gate"
Inflight validations are now properly deactivated within the
tripleoclient/tripleo-common code.

This reverts commit 1761fc81c252e3dd565fe4f27e13f2c26426c806.

Change-Id: I4ea9bfadbcc71c847232c8585d99f8698daffc9a
2019-10-15 12:36:05 +00:00
Oliver Walsh
1761fc81c2 Temporaily disable nova inflight healthchecks to unblock the gate
Change-Id: I8b687dcf7b36730a282e2091566a15a7ddc6fd23
Related-bug: #1843555
2019-09-30 12:44:42 +01:00
Oliver Walsh
c919f1b65b Wait for first healthcheck before running validation tasks
The systemd healthcheck timer first triggers 120s after activation.
The initial value for ExecMainStatus is 0, resulting in false positives if we
check this too early.
This changes waits (up to 5 mins) for ExecMainPID to be set and the service to
return to an inactive/failed state.

Change-Id: Iad4ebb283a7a6559b6fffead4145cc9bbad45e4e
Depends-On: Ia2897a6be3e000a9594103502b716431baa615b1
Related-bug: #1843555
2019-09-14 02:15:58 +00:00
Oliver Walsh
5089d09ba6 Fix nova-conductor healthcheck RPC port
It currently assumes the default RPCPort.

Change-Id: Idbb1738db0f4cc3efb9005c2bfee188d3a9ef5be
Closes-Bug: #1843890
2019-09-13 13:12:52 +01:00
Oliver Walsh
84a3cc1afd Skip systemd healthcheck validation on docker
The validation tasks added in I2c044e3d2af7f747acde5ad3bf256386b8c550a3 are not
valid on docker. As it's now deprecated we can just skip them.

Change-Id: I4ff530af8ad7f864b8038e5e509ec38840096c5d
Related-bug: #1842687
2019-09-12 14:56:26 -04:00
Emilien Macchi
7064cd8e90 nova: use systemd to check container healthchecks
Instead of running "podman exec" to test the container healthchecks, we
should rather rely on the status of systemd timers which reflect the
real state of the healthchecks, since they run under a specific user and
pid.

Also, we should only test the healthchecks if
ContainerHealthcheckDisabled is set to False.

Change-Id: I2c044e3d2af7f747acde5ad3bf256386b8c550a3
Closes-Bug: #1842687
2019-09-06 15:05:33 +05:30
Martin Magr
5ccf8951e5 Remove fluentd composable service
This patch removes fluentd composable service in favor of rsyslog composable service
and modifies *LoggingSource configuration accordingly.

Change-Id: I1e12470b4eea86d8b7a971875d28a2a5e50d5e07
2019-08-29 13:52:55 +01:00
Zuul
456c8da28c Merge "Add inflight validations for compute services" 2019-08-14 13:56:08 +00:00
Carlos Camacho
8529ce60da Stop services for unupgraded controllers
Before we start services on upgraded bootstrap
controller (usually controller-0), we need to
stop services on unupgraded controllers
(usually controller-1 and controller-2).

Also we need to move the mysql data transfer
to the step 2 as we need to first stop the
services.

Depends-On: I4fcc0858cac8f59d797d62f6de18c02e4b1819dc
Change-Id: Ib4af5b4a92b3b516b8e2fc1ae12c8d5abe40327f
2019-08-07 19:23:11 +02:00
Rajesh Tailor
8dc0cee704 Add inflight validations for compute services
Added inflight validations for compute container
services.

Change-Id: I8a8757aec80c379656665c4a1f0952c3b29f53b8
2019-08-07 10:24:36 +05:30
Jose Luis Franco Arza
d1035703b7 Force removal of docker container in tripleo-docker-rm.
The tripleo-docker-rm role has been replaced by tripleo-container-rm [0].
This role will identify the docker engine via the container_cli variable
and perform a deletion of that container. However, these tasks inside the
post_upgrade_tasks section were thought to remove the old docker containers
after upgrading from rocky to stein, in which podman starts to be the
container engine by default.

For that reason, we need to ensure that the container engine in which the
containers are removed is docker, as otherwise we will be removing the
podman container and the deployment steps will fail.

Closes-Bug: #1836531
[0] - 2135446a35

Depends-On: https://review.opendev.org/#/c/671698/
Change-Id: Ib139a1d77f71fc32a49c9878d1b4a6d07564e9dc
2019-07-19 12:37:35 +00:00
Dan Prince
a68151d02a Convert Docker*Image parameters
This converts all Docker*Image parameter varients into
Container*Image varients.

The commit was autogenerated with the following shell commands:

for file in $(grep -lr Docker.*Image --include \*.yaml --exclude-dir releasenotes); do
  sed -e "s|Docker\([^ ]*Image\)|Container\1|g" -i $file
done

Change-Id: Iab06efa5616975b99aa5772a65b415629f8d7882
Depends-On: I7d62a3424ccb7b01dc101329018ebda896ea8ff3
Depends-On: Ib1dc0c08ce7971a03639acc42b1e738d93a52f98
2019-06-05 14:33:44 -06:00
Dan Prince
a52498ab4d Move containers-common.yaml into deployment
Change-Id: I8cc27cd8ed76a1e124cbb54c938bb86332956ac2
Related-Blueprint: services-yaml-flattening
2019-04-14 18:15:12 -04:00
Sergii Golovatiuk
2a8fcc4ddf Remove UpgradeRemoveUnusedPackages
UpgradeRemoveUnusedPackages is not used anymore. All packages are
supposed to be removed on undercloud upgrade to 14.

Change-Id: Ie6b739390ec0ae0c5773a5a6c63b49422195623a
2019-03-19 13:40:02 +00:00
Dan Prince
2325992aef Drop unused deployment services parameters
This patch drops unused parameters in several services.

Change-Id: I4fc39a1998fb83b23f3d1c28196da20fe7f56262
2019-03-04 07:52:56 -05:00
Dan Prince
e14dfc8329 Fix monitoring_subscription on misc services
Some of these were missing or got dropped due to recent flattening
efforts.

Change-Id: I7c7c2ea134aa8b18c7d19c3d9435c90cc49cda77
2019-03-04 07:52:56 -05:00
Jill Rouleau
acb61d2c18 step4: flatten nova service configuration
This change combines the previous puppet and docker files
into a single file that performs the docker service installation
and configuration.

Change-Id: I9bd5c9f007d9f69d7310cdd0106bcc923c1b0acd
2019-02-20 14:28:20 -07:00
Zuul
a21b246010 Merge "Remove upgrade_tasks added during nova services flattening." 2019-02-20 13:26:36 +00:00
Jose Luis Franco Arza
3a86fc57d7 Remove upgrade_tasks added during nova services flattening.
During some of the nova service flattening it was included some of the
baremetal upgrade_tasks into the containerized services. This patch removes
them.

Change-Id: I4a569195deeadb34180561c778dabe77be4f6466
Closes-Bug: #1816453
2019-02-19 17:19:35 +01:00
Rajesh Tailor
f7bc59d4b8 Fail to live migration if instance has NUMA topology
Live migration is currently totally broken if a NUMA topology is
present. This affects everything that's been regrettably stuffed in with
NUMA topology including CPU pinning, hugepage support and emulator
thread support. Side effects can range from simple unexpected
performance hits (due to instances running on the same cores) to
complete failures (due to instance cores or huge pages being mapped to
CPUs/NUMA nodes that don't exist on the destination host).

Until such a time as we resolve these issues, we should alert users to
the fact that such issues exist. A workaround option is provided for
operators that _really_ need the broken behavior, but it's defaulted to
False to highlight the brokenness of this feature to unsuspecting
operators.

The related nova change is I217fba9138132b107e9d62895d699d238392e761

The proposed change allows to configure the 'enable_numa_live_migration'
workarounds option through TripleO. By default this feature will be
disabled for NUMA topology instances.

Depends-On: I16794fbfef0e6e83d3fcebb9e6bc2fcf478ebf72
Change-Id: I523756b418afe1827490c936966af8936ffdbaa6
2019-02-19 13:38:13 +05:30
Oliver Walsh
dc9a76aa23 cell_v2 multi-cell
- uses split-control-plane
- adds a new CellController role
  - nova-conductor, message rpc (not notifications) and db
- move nova dbsync from nova-api to nova-conductor
  - nova db is more tightly coupled to conductor/computes
  - we don't have a nova-api services on a CellController
  - super-conductor on Controller will sync cell0 db
- new 'magic' MysqlCellInternal endpoint
  - always refers the to local MysqlInternal endpoint
  - identical to MysqlInternal for regular deployment
  - but doesn't get overridden when inheriting EndpointMap from parent
    control-plane stack
- duplicate service node name hiera for transport_urls on cell stack
  - nova -> cell oslo messaging rpc nodes
  - neutron agent -> global messaging rpc nodes
- run cell host discovery only on default cell, for additional cells
the cell needs to be created first

bp tripleo-multicell-basic

Co-Authored-By: Martin Schuppert <mschuppert@redhat.com>

Change-Id: Ife9bf12d3a6011906fa8d9f97f7524b51aef906a
Depends-On: I79c1080605611c5c7748a28d2afcc9c7275a2e5d
2019-02-15 12:16:48 +01:00
Jill Rouleau
92ea1131c7 step3: flatten nova service configuration
This change combines the previous puppet and docker files
into a single file that performs the docker service installation
and configuration. With this patch the baremetal version of
nova has been removed.

Change-Id: Ic577851f8d865d5eec41dbfb00c27520bedc3fdb
2019-02-13 06:21:17 +00:00