[train-only] post stack creation tsx validation

RHEL-8.3 kernel disabled the Intel TSX (Transactional
Synchronization Extensions) feature by default as a preemptive
security measure, but it breaks live migration from RHEL-7.9
(or even RHEL-8.1 or RHEL-8.2) to RHEL-8.3.

Operators are expected to explicitly define the TSX flag in
their KernelArgs for the compute role to prevent live-migration
issues during the upgrade process.

This is explained in detail in this article [a]

If operators don't want to add the TSX flag to the KernelArgs,
they can always set "ForceNoTsx" to true.

Adding this mandatory validation right after the stacks are
updated is probably the earliest place where we can validate
and fail if necessary. We'd rather fail quickly than too late
as this will provide the best experience for our users.

In addition to this, there's a tripleo-validation [b] in the
work.

This is meant to be train-only for now but we will have to
refactor if (when?) we support FFU from queens to Wallaby+

[a] https://access.redhat.com/solutions/6036141
[b] https://review.opendev.org/c/openstack/tripleo-validations/+/790806

Co-Authored-By: Martin Schuppert <mschuppert@redhat.com>
Related: https://bugzilla.redhat.com/1923165
Closes-Bug: #1916758
Change-Id: I35246fbf74394f6e315973283464085d2aef08b2
This commit is contained in:
David Vallee Delisle 2021-05-13 03:47:47 +00:00
parent 0a0296f1fa
commit 050c9aa99f
5 changed files with 72 additions and 0 deletions

View File

@ -0,0 +1,17 @@
---
fixes:
- |
RHEL-8.3 kernel disabled the Intel TSX (Transactional
Synchronization Extensions) feature by default as a preemptive
security measure, but it breaks live migration from RHEL-7.9
(or even RHEL-8.1 or RHEL-8.2) to RHEL-8.3.
Operators are expected to explicitly define the TSX flag in
their KernelArgs for the compute role to prevent live-migration
issues during the upgrade or update process.
We now introduce this validation in tripleoclient to ensure
early failure.
More information here:
https://access.redhat.com/solutions/6036141

View File

@ -147,3 +147,7 @@ class CellExportError(Base):
class BannedParameters(Base):
"""Some of the environment parameters provided should be removed"""
class PostStackValidationError(Base):
"""Stack validation failed"""

View File

@ -712,6 +712,42 @@ class DeployOvercloud(command.Command):
roles=roles
)
def _post_stack_validation(self, stack):
"""Post stack update mandatory validation
Runs a validation in the to make sure that KernelArgs either
contains a TSX parameter or the ForceNoTsx parameter is defined.
This is a mandatory validation and it has to happen before
as soon as possible.
"""
libvirt_service = "OS::TripleO::Services::NovaLibvirt"
services = filter(lambda x: (x.endswith('Services') and
libvirt_service in stack.parameters[x]),
stack.parameters)
impacted_roles = []
for i in services:
role_name = re.sub('Services$', '', i)
role_param = stack.parameters.get(role_name + 'Parameters')
if role_param:
role_params = json.loads(role_param)
kernel_args = role_params.get('KernelArgs')
no_tsx = role_params.get('ForceNoTsx')
if (not no_tsx and
(not kernel_args or "tsx=" not in kernel_args)):
impacted_roles.append(role_name)
if len(impacted_roles):
self.log.error("Roles in the following list are expected to have "
"a TSX flag configured in their KernelArgs "
"parameter. For more information on why we must "
"explicitly define the TSX flag, please visit: "
"https://access.redhat.com/solutions/6036141")
self.log.error("You can also skip this validation by setting "
"ForceNoTsx parameter for the desired role(s)")
self.log.error("Impacted roles: {roles}".format(
roles=",".join(impacted_roles)))
raise exceptions.PostStackValidationError()
def get_parser(self, prog_name):
# add_help doesn't work properly, set it to False:
parser = argparse.ArgumentParser(

View File

@ -86,6 +86,14 @@ class UpdatePrepare(DeployOvercloud):
super(UpdatePrepare, self).take_action(parsed_args)
package_update.update(clients, container=stack_name)
# "Mandatory" validation to make sure kernelargs contains
# a TSX flag
if not parsed_args.disable_validations:
stack = oooutils.get_stack(clients.orchestration,
parsed_args.stack)
self._post_stack_validation(stack)
package_update.get_config(clients, container=stack_name)
self.log.info("Update init on stack {0} complete.".format(
parsed_args.stack))

View File

@ -102,6 +102,13 @@ class UpgradePrepare(DeployOvercloud):
# DeployOvercloud.
package_update.get_config(clients, container=stack_name)
# "Mandatory" validation to make sure kernelargs contains
# a TSX flag
if not parsed_args.disable_validations:
stack = oooutils.get_stack(clients.orchestration,
parsed_args.stack)
self._post_stack_validation(stack)
# enable ssh admin for Ansible-via-Mistral as that's done only
# when config_download is true
deployment.get_hosts_and_enable_ssh_admin(