47 Commits

Author SHA1 Message Date
ramishra
c9991c2e31 Use 'wallaby' heat_template_version
With I57047682cfa82ba6ca4affff54fab5216e9ba51c Heat has added
a new template version for wallaby. This would allow us to use
2-argument variant of the ``if`` function that would allow for
 e.g. conditional definition of resource properties and help
cleanup templates. If only two arguments are passed to ``if``
function, the entire enclosing item is removed when the condition
is false.

Change-Id: I25f981b60c6a66b39919adc38c02a051b6c51269
2021-03-31 17:35:12 +05:30
Rajesh Tailor
8d66001fc5 Add parameter NovaSchedulerQueryPlacementForRoutedNetworkAggregates
Add parameter NovaSchedulerQueryPlacementForRoutedNetworkAggregates
that allows the scheduler to look at the nova aggregates related
to requested routed network segments.

Depends-On: https://review.opendev.org/c/openstack/puppet-nova/+/776922
Change-Id: I7a2f8154f1f02ce8f57d370ff1baecf79f5300b2
2021-02-22 17:13:33 +05:30
ramishra
7f195ff9a8 Remove DefaultPasswords interface
This was mainly there as an legacy interface which was
for internal use. Now that we pull the passwords from
the existing environment and don't use it, we can drop
this.

Reduces a number of heat resources.

Change-Id: If83d0f3d72a229d737a45b2fd37507dc11a04649
2021-02-12 11:38:44 +05:30
David Vallee Delisle
faab7e7856 Removing scheduler_default_filters for Nova
This setting was renamed enabled_filters a while back. For the sake of
consistency, we need to change this here as well.

Therefore, we're deprecating NovaSchedulerDefaultFilters and creating a
new setting called NovaSchedulerEnabledFilters. This is jointly
committed with a relevant change in puppet-nova.

Depends-On: I110f612f1b78899e8969da607e6b400e2e64c8a1
Change-Id: I0e425e247be1e3ad7004a5667a0887949a2a031c
2020-12-14 09:32:50 -05:00
Oliver Walsh
9d82364de8 Refactor nova db config
It is best to avoid placing db creds on the compute nodes to limit the
exposure if an attacker succeeds in gaining access to the hypervisor
host.

Related patches in puppet-nova remove the credentials from nova.conf
however the current scope of db credential hieradata is all nova tripleo
services - so it will but written to the hieradata keys on compute
nodes.

This patch refactors the nova hieradata structure, splitting the
nova-api/nova database hieradata out into individual templates and
selectively including only where necessary, ensuring we have no db
creds on a compute node (unless it is an all-in-one api+compute node).

Depends-On: I07caa3185427b48e6e7d60965fa3e6157457018c
Change-Id: Ia4a29bdd2cd8e894bcc7c0078cf0f0ab0f97de0a
Closes-bug: #1871482
2020-11-18 12:22:48 +00:00
Jose Luis Franco Arza
8783ec9c45 Remove ffwd-upgrade leftovers from THT.
Now that the FFU process relies on the upgrade_tasks and deployment
tasts there is no need to keep the old fast_forward_upgrade_tasks.

This patch removes all the fast_forward_upgrade_tasks section from
the services, as well as from the common structures.

Change-Id: I39b8a846145fdc2fb3d0f6853df541c773ee455e
2020-07-23 15:33:25 +00:00
Michele Baldessari
bd4b57c269 Remove /run from some services
redis(non-pcmk), nova-scheduler and swift-proxy do not need /run bind
mounted from the host.  As a matter of fact bind-mounting /run is
problematic due to a number of reasons (see LP#1883849 for more
background). In particular swift-proxy is the only swift container
(out of 9) that has /run bind-mounted.

These three services always had /run from the very beginning:
- redis -> Ie750caa34c6fa22ca6eae6834b9ca20e15d97f7f
- nova-scheduler -> I39436783409ed752b08619b07b0a0c592bce0456
- swift-proxy -> I2d96514fb7aa51dffe8fe293bc950e0e99df5e94

Tested this by applying this patch on a train deployment and
deployed an undercloud and an overcloud with it.
Verified that:
A) /run:/run is not present in the three containers
B) Deploy of UC and OC worked correctly
C) Tempest still works
D) Restarting the swift_proxy and nova_scheduler works correctly
E) Reboot the overcloud worked okay and tempest still works after the
   full overcloud reboot
F) Ran a minor UC update
G) Ran a minor update on all nodes and tempest still worked
H) Ran a redeploy on all nodes and tempest still worked

NB: I did not investigate other containers that bind mount /run
because they 1) seem to need it and 2) had no means to do proper
testing.

NB2: Note that while once we rebuild containers with
I81e5b7abf4571fece13a029e25911e9e4dece673 this change here is not
strictly needed for the LP bug, but is a nice cleanup nonetheless.
So this is to be backported only if rebuilding containers is
a problematic/costly move

Change-Id: Ic1a892a7f78a54b5e149f5ce52cb9db68ebc9529
Related-Bug: #1883849
2020-07-11 07:03:24 +00:00
Rajesh Tailor
8d968a213a Add new parameter NovaSchedulerQueryPlacementForAvailabilityZone
Add new parameter NovaSchedulerQueryPlacementForAvailabilityZone
that allows the scheduler to look up a host aggregate with metadata
key of availability zone set to the value provided by incoming request,
and request result from placement be limited to that aggregate.

Depends-On: https://review.opendev.org/#/c/728368/
Change-Id: I7aba7e37ff3b919d98cdc64fd4d4eb30da68b0ec
2020-05-15 11:59:42 +05:30
Zuul
6746925981 Merge "healthchecks: check if fact is defined before checking its value" 2020-05-13 18:48:22 +00:00
Archit Modi
61f2bd017c Change Schedule to Scheduler for consistent naming
Change [1] introduced a new param NovaSchedulePlacementAggregateRequiredForTenants
which has minor inconsistent naming issue. This patch changes it to
NovaSchedulerPlacementAggregateRequiredForTenants

[1] https://review.opendev.org/#/c/716154/

Change-Id: I34005d28eae325197225918720da7aa0b8878510
2020-05-11 14:38:20 -04:00
Emilien Macchi
21d1f773c7 healthchecks: check if fact is defined before checking its value
When checking if keystone/nova healthchecks are healthy, make sure the
registered fact is set (which can slip to a further retry if podman
inspect took too much time to execute).

That way, we process the retries without an error like found in the bug
report.

Change-Id: I9f5063c9c3b598afd5bd01447f00a1146a20f4c3
Closes-Bug: #1878063
2020-05-11 13:39:06 -04:00
Emilien Macchi
4ba1c013a7 Re-validate healthcheck work on nova/keystone containers
They were disabled until the native podman healthcheck was integrated in
tripleo-ansible and it finally merged; so we can remove that safeguard
and it should be working.

Change-Id: I03361c33e54f0c8e71b420b144464ccb29a1ca4e
2020-04-27 21:40:42 -04:00
Emilien Macchi
6464efdc4e Migrate inflight validations to native podman healthchecks
The systemd healthchecks are moving away, so we can use the native
podman healthchecks interface.

See I37508cd8243999389f9e17d5ea354529bb042279 for the whole context.

This patch does the following:

- Migrate the healthcheck checks to use podman inspect instead of
  systemd service status.
- Force the tasks to not run, because we first need
  https://review.opendev.org/#/c/720061 to merge

Once https://review.opendev.org/#/c/720061 is merged, we'll remove the
condition workaround and also migrate to unify the way containers are
checked; and use the role in tripleo-validations.

Depends-On: https://review.opendev.org/720283
Change-Id: I7172d81d305ac8939bee5e7f64960b0a9fea8627
2020-04-15 20:23:58 +00:00
Rajesh Tailor
411af91237 Add new parameter NovaSchedulerEnableIsolatedAggregateFiltering
Add new paramter NovaSchedulerEnableIsolatedAggregateFiltering
allows for configuring `scheduler/enable_isolated_aggregate_filtering`
parameter to restrict hosts in aggregates based on matching required
traits in the aggregate metadata and the instance flavor/image.

Also this fixes the typo in scheduler spelling in config_settings
section for placement_aggregate_required_for_tenants parameter.

Depends-On: https://review.opendev.org/#/c/718326/
Change-Id: Ic6ef33d9c02be7e87c30626930f9e3d44cb77875
2020-04-10 12:00:38 +05:30
Rajesh Tailor
148185cada Add new parameter NovaSchedulePlacementAggregateRequiredForTenants
Add new paramter NovaSchedulePlacementAggregateRequiredForTenants
allows for configuring scheduler to enable tenant-isolation with
placement, which controls whether or not a tenant with no aggregate
affinity will be allowed to schedule to any available node.

Depends-On: https://review.opendev.org/#/c/716152/
Change-Id: Id61deee3ec2981b2caddd07b06bef0b701e40ac2
2020-04-01 17:01:51 +05:30
Emilien Macchi
38bad5283f Remove all ignore_errors to avoid confusion when debugging
- deploy-steps-tasks-step-1.yaml: Do not ignore errors when dealing
  with check-mode directories. The file module is resilient enough to
  not fail if the path is already absent.

- deploy-steps-tasks.yaml: Replace ignore_errors by another condition,
  "not ansible_check_mode"; this task is not needed in check mode.

- generate-config-tasks.yaml: Replace ignore_errors by another
  condition, "not ansible_check_mode"; this task is not needed in check mode.

- Neutron wrappers: use fail_key: False instead of ignore_errors: True
  if a key can't be found in /etc/passwd.

- All services with service checks: Replace "ignore_errors: true" by
  "failed_when: false". Since we don't care about whether or not the
  task returns 0, let's just make the task never fail. It will only
  improve UX when scrawling logs; no more failure will be shown for
  these tasks.

- Same as above for cibadmin commands, cluster resources show
  commands and keepalived container restart command; and all other shell
  or command or yum modules uses where we just don't care about their potential
  failures.

- Aodh/Gnocchi: Add pipefail so the task isn't support to fail

- tripleo-packages-baremetal-puppet and undercloud-upgrade: check shell
  rc instead of "succeeded", since the task will always succeed.

Change-Id: I0c44db40e1b9a935e7dde115bb0c9affa15c42bf
2020-03-05 09:22:04 -05:00
Jesse Pretorius (odyssey4me)
2092b1303f Update ffwd-upgrade branch names
The next iteration of fast-forward-upgrade will be
from queens through to train, so we update the names
accordingly.

Change-Id: Ia6d73c33774218b70c1ed7fa9eaad882fde2eefe
2020-01-27 19:42:40 +00:00
Kevin Carter
9a2a36437d
Update all roles to use the new role name
Ansible has decided that roles with hypens in them are no longer supported
by not including support for them in collections. This change renames all
the roles we use to the new role name.

Depends-On: Ie899714aca49781ccd240bb259901d76f177d2ae
Change-Id: I4d41b2678a0f340792dd5c601342541ade771c26
Signed-off-by: Kevin Carter <kecarter@redhat.com>
2020-01-20 10:32:23 -06:00
Sagi Shnaidman
016f7c6002 Remove unnecessary slash volume maps
When podman parses such volume map it removes the slash
automatically and shows in inspection volumes w/o slash.
When comparing configurations it turns to be a difference and
it breaks idempotency of containers, causing them to be recreated.

Change-Id: Ifdebecc8c7975b6f5cfefb14b0133be247b7abf0
2019-12-04 20:32:14 +02:00
Jose Luis Franco Arza
4cbae84c75 Get rid of docker removing in post_upgrade tasks.
When upgrading from Rocky to Stein we moved also from using the docker
container engine into Podman. To ensure that every single docker container
was removed after the upgrade a post_upgrade task was added which made
use of the tripleo-docker-rm role that removed the container. In this cycle,
from Stein to Train both the Undercloud and Overcloud work with Podman, so
there is no need to remove any docker container anymore.

This patch removes all the tripleo-docker-rm post-upgrade task and in those
services which only included a single task, the post-upgrade-tasks section
is also erased.

Change-Id: I5c9ab55ec6ff332056a426a76e150ea3c9063c6e
2019-11-12 16:33:38 +01:00
Zuul
21b56ec34a Merge "Revert "Temporaily disable nova inflight healthchecks to unblock the gate"" 2019-10-17 17:07:37 +00:00
Emilien Macchi
81258ae551 Convert container environment from a list to a dict
Moving all the container environments from lists to dicts, so they can
be consumed later by the podman_container ansible module which uses
dict.

Using a dict is also easier to parse, since it doesn't involve "=" for
each item in the environment to export.

Change-Id: I894f339cdf03bc2a93c588f826f738b0b851a3ad
Depends-On: I98c75e03d78885173d829fa850f35c52c625e6bb
2019-10-16 01:29:31 +00:00
Cédric Jeanneret
affbe57a8b Revert "Temporaily disable nova inflight healthchecks to unblock the gate"
Inflight validations are now properly deactivated within the
tripleoclient/tripleo-common code.

This reverts commit 1761fc81c252e3dd565fe4f27e13f2c26426c806.

Change-Id: I4ea9bfadbcc71c847232c8585d99f8698daffc9a
2019-10-15 12:36:05 +00:00
Oliver Walsh
1761fc81c2 Temporaily disable nova inflight healthchecks to unblock the gate
Change-Id: I8b687dcf7b36730a282e2091566a15a7ddc6fd23
Related-bug: #1843555
2019-09-30 12:44:42 +01:00
Oliver Walsh
c919f1b65b Wait for first healthcheck before running validation tasks
The systemd healthcheck timer first triggers 120s after activation.
The initial value for ExecMainStatus is 0, resulting in false positives if we
check this too early.
This changes waits (up to 5 mins) for ExecMainPID to be set and the service to
return to an inactive/failed state.

Change-Id: Iad4ebb283a7a6559b6fffead4145cc9bbad45e4e
Depends-On: Ia2897a6be3e000a9594103502b716431baa615b1
Related-bug: #1843555
2019-09-14 02:15:58 +00:00
Oliver Walsh
84a3cc1afd Skip systemd healthcheck validation on docker
The validation tasks added in I2c044e3d2af7f747acde5ad3bf256386b8c550a3 are not
valid on docker. As it's now deprecated we can just skip them.

Change-Id: I4ff530af8ad7f864b8038e5e509ec38840096c5d
Related-bug: #1842687
2019-09-12 14:56:26 -04:00
Emilien Macchi
7064cd8e90 nova: use systemd to check container healthchecks
Instead of running "podman exec" to test the container healthchecks, we
should rather rely on the status of systemd timers which reflect the
real state of the healthchecks, since they run under a specific user and
pid.

Also, we should only test the healthchecks if
ContainerHealthcheckDisabled is set to False.

Change-Id: I2c044e3d2af7f747acde5ad3bf256386b8c550a3
Closes-Bug: #1842687
2019-09-06 15:05:33 +05:30
Martin Magr
5ccf8951e5 Remove fluentd composable service
This patch removes fluentd composable service in favor of rsyslog composable service
and modifies *LoggingSource configuration accordingly.

Change-Id: I1e12470b4eea86d8b7a971875d28a2a5e50d5e07
2019-08-29 13:52:55 +01:00
Zuul
456c8da28c Merge "Add inflight validations for compute services" 2019-08-14 13:56:08 +00:00
Carlos Camacho
8529ce60da Stop services for unupgraded controllers
Before we start services on upgraded bootstrap
controller (usually controller-0), we need to
stop services on unupgraded controllers
(usually controller-1 and controller-2).

Also we need to move the mysql data transfer
to the step 2 as we need to first stop the
services.

Depends-On: I4fcc0858cac8f59d797d62f6de18c02e4b1819dc
Change-Id: Ib4af5b4a92b3b516b8e2fc1ae12c8d5abe40327f
2019-08-07 19:23:11 +02:00
Rajesh Tailor
8dc0cee704 Add inflight validations for compute services
Added inflight validations for compute container
services.

Change-Id: I8a8757aec80c379656665c4a1f0952c3b29f53b8
2019-08-07 10:24:36 +05:30
Jose Luis Franco Arza
d1035703b7 Force removal of docker container in tripleo-docker-rm.
The tripleo-docker-rm role has been replaced by tripleo-container-rm [0].
This role will identify the docker engine via the container_cli variable
and perform a deletion of that container. However, these tasks inside the
post_upgrade_tasks section were thought to remove the old docker containers
after upgrading from rocky to stein, in which podman starts to be the
container engine by default.

For that reason, we need to ensure that the container engine in which the
containers are removed is docker, as otherwise we will be removing the
podman container and the deployment steps will fail.

Closes-Bug: #1836531
[0] - 2135446a35

Depends-On: https://review.opendev.org/#/c/671698/
Change-Id: Ib139a1d77f71fc32a49c9878d1b4a6d07564e9dc
2019-07-19 12:37:35 +00:00
Rajesh Tailor
2074b356f5 Add new parameter NovaSchedulerLimitTenantsToPlacementAggregate
Add new parameter NovaSchedulerLimitTenantsToPlacementAggregate
allows for configuring scheduler to enable tenant-isolation with
placement, which ensures that hosts in tenant-isolated host
aggregate and availability zones will only be available to
specific set of tenants.

Depends-On: https://review.opendev.org/#/c/669252/
Change-Id: Ic1a7ff0996c5cfeec2ca013bc4e5b2eddab0c377
2019-07-08 12:23:30 +05:30
Zuul
49a71b7b08 Merge "Enable Request Filter for Image Types" 2019-06-25 23:59:11 +00:00
Piotr Kopec
37a6aa8599 Enable Request Filter for Image Types
This change enables `scheduler/query_placement_for_image_type_support`
by default for all deployments. Setting it causes the scheduler to ask
placement only for compute hosts that support the disk_format of the
image used in the request which is beneficial for example, the libvirt
driver, when using ceph as an ephemeral backend, does not support ``qcow2``
images (without an expensive conversion step).

Change-Id: I6d12a66616cc2cc65e62755e8a54084d65aeae5e
Closes-Bug: #1832738
2019-06-15 10:37:07 +02:00
Dan Prince
a68151d02a Convert Docker*Image parameters
This converts all Docker*Image parameter varients into
Container*Image varients.

The commit was autogenerated with the following shell commands:

for file in $(grep -lr Docker.*Image --include \*.yaml --exclude-dir releasenotes); do
  sed -e "s|Docker\([^ ]*Image\)|Container\1|g" -i $file
done

Change-Id: Iab06efa5616975b99aa5772a65b415629f8d7882
Depends-On: I7d62a3424ccb7b01dc101329018ebda896ea8ff3
Depends-On: Ib1dc0c08ce7971a03639acc42b1e738d93a52f98
2019-06-05 14:33:44 -06:00
Alan Bishop
c5fe51147b Use RpcPort for container healthchecks
Update healthcheck commands that probe oslo's messaging port to use the
RpcPort parameter. Previously, some templates referenced the service's
own 'rabbit_port' config setting, which led to malformed healthcheck
commands when the 'rabbit_port' settings were deprecated.

Update the templates that looked up the port in the RabbitMQService's
global_config_settings. Not only did this break the oslo abstraction
by referring to a specific messaging backend (rabbit), it broke
split-stack deployments in which the RabbitMQService is not actually
deployed on the secondary stack's nodes.

This patch creates a common healthcheck command using the RpcPort
parameter in containers-common.yaml. This allows other templates to
reference a common healthcheck command. Other templates that should
also use this can be cleaned up in a separate patch.

Closes-Bug: #1825342
Change-Id: I0d3974089ae6e6879adab4852715c7a1c1188f7c
2019-05-09 14:41:36 -04:00
Kamil Sambor
485b3c9644 Remove hardcoded RabbitMQService
Change-Id: I42f99eb17520b8e04fe85fa69df4cdee753bf6af
Depends-On: https://review.opendev.org/#/c/657831/
Partial-Bug: #1824326
2019-05-08 16:59:32 +02:00
Zuul
b0e23c2b41 Merge "Use oslo_messaging_rpc_port for nova rpc healthchecks" 2019-04-18 11:06:52 +00:00
Martin Schuppert
8ff04029f5 Use oslo_messaging_rpc_port for nova rpc healthchecks
With 405366fa32583e88c34417e5f46fa574ed8f4e98 the parameters RpcPort,
RpcUserName, RpcPassword and RpcUseSSL got deprecated and
nova::rabbitmq_port removed. As a result the healtcheck get called with
null parameter and fail.
We now get the global_config_settings from RabbitMQService and use
oslo_messaging_rpc_port for the healthcheck.

Change-Id: I1849926b1d6256de5f4d677de5a9b34d78aad5d0
Closes-Bug: #1824805
2019-04-17 09:35:20 +00:00
Dan Prince
a52498ab4d Move containers-common.yaml into deployment
Change-Id: I8cc27cd8ed76a1e124cbb54c938bb86332956ac2
Related-Blueprint: services-yaml-flattening
2019-04-14 18:15:12 -04:00
Sergii Golovatiuk
2a8fcc4ddf Remove UpgradeRemoveUnusedPackages
UpgradeRemoveUnusedPackages is not used anymore. All packages are
supposed to be removed on undercloud upgrade to 14.

Change-Id: Ie6b739390ec0ae0c5773a5a6c63b49422195623a
2019-03-19 13:40:02 +00:00
Dan Prince
e14dfc8329 Fix monitoring_subscription on misc services
Some of these were missing or got dropped due to recent flattening
efforts.

Change-Id: I7c7c2ea134aa8b18c7d19c3d9435c90cc49cda77
2019-03-04 07:52:56 -05:00
Jill Rouleau
acb61d2c18 step4: flatten nova service configuration
This change combines the previous puppet and docker files
into a single file that performs the docker service installation
and configuration.

Change-Id: I9bd5c9f007d9f69d7310cdd0106bcc923c1b0acd
2019-02-20 14:28:20 -07:00
Jose Luis Franco Arza
3a86fc57d7 Remove upgrade_tasks added during nova services flattening.
During some of the nova service flattening it was included some of the
baremetal upgrade_tasks into the containerized services. This patch removes
them.

Change-Id: I4a569195deeadb34180561c778dabe77be4f6466
Closes-Bug: #1816453
2019-02-19 17:19:35 +01:00
Oliver Walsh
dc9a76aa23 cell_v2 multi-cell
- uses split-control-plane
- adds a new CellController role
  - nova-conductor, message rpc (not notifications) and db
- move nova dbsync from nova-api to nova-conductor
  - nova db is more tightly coupled to conductor/computes
  - we don't have a nova-api services on a CellController
  - super-conductor on Controller will sync cell0 db
- new 'magic' MysqlCellInternal endpoint
  - always refers the to local MysqlInternal endpoint
  - identical to MysqlInternal for regular deployment
  - but doesn't get overridden when inheriting EndpointMap from parent
    control-plane stack
- duplicate service node name hiera for transport_urls on cell stack
  - nova -> cell oslo messaging rpc nodes
  - neutron agent -> global messaging rpc nodes
- run cell host discovery only on default cell, for additional cells
the cell needs to be created first

bp tripleo-multicell-basic

Co-Authored-By: Martin Schuppert <mschuppert@redhat.com>

Change-Id: Ife9bf12d3a6011906fa8d9f97f7524b51aef906a
Depends-On: I79c1080605611c5c7748a28d2afcc9c7275a2e5d
2019-02-15 12:16:48 +01:00
Jill Rouleau
92ea1131c7 step3: flatten nova service configuration
This change combines the previous puppet and docker files
into a single file that performs the docker service installation
and configuration. With this patch the baremetal version of
nova has been removed.

Change-Id: Ic577851f8d865d5eec41dbfb00c27520bedc3fdb
2019-02-13 06:21:17 +00:00