Extend the overcloud node provision command to run
ansible playbooks defined in the baremetal deployment
definition against the provisioned nodes.
To ensure the playbook is applied prior to node network
configuration set 'pre_network: true'. Additonal ansible
vars can be defined as 'extra_vars' for each ansible
playbook definition.
Implements: blueprint network-data-v2-ports
Depends-On: https://review.opendev.org/786045
Change-Id: I67a15f637a62e2cb683e6e160483201f7ba093e9
Adds a wait before utils.launch_heat returns to ensure that the
necessary rabbitmq queues have been created. Without the wait, there is
a race condition between the queues being created and the first
heatclient command being executed, which can lead to a
oslo_messaging.exceptions.MessagingTimeout exception.
Also fixes a minor issue in the choices for the --heat-type option for
overcloud deploy. The choice should be "installed" and not "system".
Change-Id: Ib63813b63c37fa2cee57a211535d43d605131529
Signed-off-by: James Slagle <jslagle@redhat.com>
This refactors and moves the code to tripleoclient
for us to be able to remove dependency on mistral-lib.
Change-Id: I62352871311e98927fbd560b4235114c8c62f223
The client utils will now run a new playbook to ensure that the local
archive directory is created early in the deployment process. This
change will allow us to build toward a swift-less deployment. All of
the client calls, save one, has been moved to use tripleo-common which
will assist us to better manage, and migrate from swift storage to a
local archive.
> As a product of this change all of the "webhook" calls have been
removed. which was deprecated as part of the Zaqar and Mistral work.
These calls were removed because several swift calls were tied into
them, and because mistral is no longer part of the stack, and has
been gone for a few cycles, we can safely remove these calls which
do nothing.
Depends-On: Ibe9b2ffe94cdf493fc84366979d1d78b8528ea1b
Change-Id: I7531612a49527f8a21df415c648acb41ac7a0b10
Signed-off-by: Kevin Carter <kecarter@redhat.com>
Update/Upgrade commands have now a prompt by default that ask for
confirmation before proceeding. It'll prevent an user to run the
command that may cause the problems to infrastructure.
This prompt can be skipped with --yes/-y argument.
Note: putting "UPDATE" and "UPGRADE" in uppercase to make sure this is
visible and clear. We have seen many users running the wrong command and
ending up doing an upgrade instead of an update.
Note2: this prompt will be ported to the upgrade and FFWD workflows to
prevent unexpected execution to prevent potential harm to
infrastructures.
Depends-On: https://review.opendev.org/741480
Change-Id: I838e6748879c668dd004ca2243b7b00b857c2a7b
This change allows us identifying a set of parameters which should
not been passed in the upgrade prepare or upgrade converge steps.
As it is now, it is mostly intended to block the converge step
if the FFU parameters (Stein registry parameters) were left in
the environment files before running the converge step, however
it will allow blocking the upgrade prepare in the case that some
deprecated or not recommended parameter is provided in the templates.
The way how it works is by converting every single yaml passed in
the environment files into a list of keys (only for the
parameters_default so far), then it will try to intersect the list
of forbidden parameters with the list of keys. If there is a match
an exception will be raised showing those parameters:
ERROR openstack [-] The following parameters should be removed from
the environment files:
ceph3_namespace
name_suffix_stein
tag_stein
name_prefix_stein
ceph3_image
namespace_stein
ceph3_tag
Change-Id: I24715f5e55d4cd6cf9879345980d3a3c5ab8830c
This patch adds handling and checking of any instances of the workflow
tripleo.deployment.v1.config_download_deploy already in progress for the
current stack. It will prevent duplicate instances of the same workflow
being started and running at the same time.
It will allow for multiple instances of the workflow running at the same
time as long as they are for different stacks.
Change-Id: Ic8dbf28b5796ff998165b6b73b941f21c65f1dfa
Closes-Bug: #1852314
There is no need to disallow concurrent stack updates with convergence
heat engine, which we have moved to since stable/queens.
Change-Id: I1d3357a0c1401b4d1c4fca3e6925895967c8e97c
[1] provides the steps on how to setup multiple cells using tripleo.
This requires to extract deployment information from the overcloud
/control plane stack which then is used as input for the cell
deployment.
With this patch we provide a new tripleoclient functionality which
helps
to automate the export steps from [1]:
* Export the default cell EndpointMap
* Export the default cell HostsEntry
* Export AllNodesConfig and GlobalConfig information
* Export passwords
[1] https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/deploy_cellv2.html#deploy-an-additional-nova-cell-v2
Depends-On: https://review.opendev.org/672415
Change-Id: Id7fdbf029a6dd1b45e9801c9cf8814a15a157ee0
The problem we're solving here is that our operators using SSL + FQDN
based endpoints will have failures during the deployment because we
don't lookup the FQDN into IP addresses, needed later in the deployment
for proper binding.
This patch transforms undercloud_*_host parameters into IP addresses:
- We raise if lookup returns nothing.
- We raise if lookup returns more than one IP.
- We support both IPv4 and IPv6.
- We raise if the IP is a loopback.
- We raise if the returned IP is invalid.
Utils changes:
* Introduce utils.is_valid_ip.
Return True if the IP is either v4 or v6. Return False otherwise.
* Introduce utils.is_loopback.
Return True if the given host is a loopback. Return False otherwise.
* Introduce utils.get_host_ips.
Returns a list of IPs for a host to lookup.
* Introduce utils.get_single_ip.
Translate an hostname or FQDN into an IP address if it is valid IP.
Return it unchanged if it is an IPv4 or IPv6 address.
If the host is not reachable, it'll raise an exception.
By default it excludes the loopbacks but it can be allowed by setting
allow_loopback = True.
* Use utils.get_single_ip to translate undercloud_admin_host and
undercloud_public_host to IP addresses.
Related-Bug: #1763776
Change-Id: Ic008cc758493aa95e8aa237d23c2f66c0a930509
The tripleoclient exceptions are supposed to have enough context,
logging their traceback just confuses the users. Note that with
--debug all exceptions will have a traceback anyway.
This change introduces a base class for tripleoclient exceptions.
Change-Id: Iffc7b557ebd7e30ff56ceaee702ed3c4466d4eea
Closes-Bug: #1824329
After fixing a bug with https://review.openstack.org/#/c/603802/ the
return introduced there is bogus, we have to raise a proper exception
and handle it like timeouts, to get all the mistral allright.
Change-Id: Idcdbd38129f5694c5452f3f8aca0388df80476b2
When upgrading the heat-based undercloud a security
question is asked to stop the upgrade if a undercloud
backup was not performed. This patch handles the case
when a negative or wrong answer is provided,
raising a new UndercloudUpradeNotConfirmed exception
which is then captured and displays an informative
log notifying that the upgrade didn't take place.
Also, removing the format call when the undercloud
upgrades fails [0] as the called string message
doesn't need parametrization.
[0] - 03254c84f6/tripleoclient/v1/undercloud.py (L158)
Change-Id: I80ae52e8a732b2827e8179ce024f92ed52a29394
Closes-Bug: #1783722
Always log the traceback for unexpected exceptions. It's very difficult
to tell where the error came from without the traceback.
It seems this was the intent with
Iad9de7fab0ee740bd20f8facd49f36e6f18d2fb1, but
traceback.format_exception doesn't actually print/log anything, it
returns a list of strings, which are just lost if not actually
saved/logged.
Instead use traceback.print_exc in the main handler to show the
traceback.
Change-Id: If84a41907e1fc782bd5a4608f445e75aa3a5d2fb
This change is to invoke the workflows specified in
plan enviroment file. Workflows can be specified for
workflow_parameters parameter in plan-environment file.
This change parses the plan-environment file, and
sequentially executes all the workflows specified in
the file.
Implements: blueprint tripleo-derive-parameters
Change-Id: I37993334a45cf5ee713438151dbbde0997bdf723
This command is to be used by an operator to run sosreport on
specific set of servers (or all) and retrieve log bundles that can
be used to debug the status of the cluster or troubleshoot issues.
Depends-On: I47c486d14c46a653c61cfd92d9f484efe0407217
Change-Id: I45699dfa6eb3e83d419c7041dbb72cc5d5e4f0ea
Implements-Blueprint: capture-environment-status-and-logs
The result of syncrhonously called Mistral actions wasn't being checked
to see if the action passed or failed. The result is now checked and if
the action has failed, an exception will be raised.
Change-Id: I95ae8c98fec94cf91f3f209b593f6c1815729fd4
Closes-Bug: #1686811
Multiple plan support has little value while unnecessarily complicating
the command UX. Being able to do one export at a time is enough. Operators
are encouraged to script the command if they wish to achieve the effect
of exporting multiple plans.
Partially implements: blueprint plan-export-command
Change-Id: I99cf5dfde82a8b4b9dfb7b3f61f4f4f1d31b58f7
At the moment the deploy command will take a number of steps, including
updating the plan and setting parameters in Mistral. Then when it gets
to the deploy, the workflow will fail. This change stops it earlier in
the process, which will be quicker and cleaner.
Change-Id: I09e40e3f27b9ba3b0f3dad97cece6afbe28bd6b9
Partial-Bug: #1640249
This patch adds a mechanism for setting a timeout when waiting for websocket
messages. It then adds it to workflow executions which are fairly predictable.
This means that they always take roughly the same length of time. Other
workflows like baremetal introspection can be much slower or quicker
depending on the the users environment.
Closes-Bug: #1618445
Change-Id: I656735d58b1b676148e6ceacfc9861b3c5f44e5d
The action result was not properly checked, leading to errors being
missed and the plan creation failing, sometimes silently.
Change-Id: I8c5391be5ff7bc4c7227ebbe4f8200eda6f8de09
Closes-Bug: #1621493
Calls to the Mistral workflows to configure boot options and the root
device.
Change-Id: Ifd868fcdd6ed2d54b40c2e1861558d0233731be5
Depends-On: I5ba0a3710012c44822dd3b8e69662bbef04d3787
Closes-Bug: #1595205
Updates the baremetal registration workflows to use Mistral
instead of python.
Co-Authored-By: Dougal Matthews <dougal@redhat.com>
Co-Authored-By: Ryan Brady <rbrady@redhat.com>
Change-Id: Ide8b7753829170f503ef962b4ad4fde388cbb0ba
Depends-On: Ifc6bdd273a8e129ea7c4269d00add64e72cd371b
Depends-On: I910f50a377bcbc2c23b527953e9df7eee9c938a4
This change simplifies using Ironic root device hints for people who only need
to change the default strategy of selecting the default device.
E.g. we have a use case for selecting the first device instead of the smallest,
so that the root device does not change after upgrade Kilo -> Liberty.
Note that this feature does not replace per-node device hints, rather
complement them with a more global and easy to use setting.
This change introduces 3 new arguments to the command:
--root-device states how to find a root devices for a node.
If this argument is provided, the client will try to detect the root device
based on the stored introspection data. Possible values:
* smallest (the same thing as IPA does by default - the smallest device)
* largest (the opposite thing - pick the largest device)
* otherwise it's treated as a comma-separated list of possible device names,
e.g. sda,vda,hda rouchly matches the logic the Kilo ramdisk used.
The resulting device WWN or serial (whatever is available) is then recorded
in the root device hints (/properties/root_device) for a node.
--root-device-minimum-size minimum size of the considered devices
The default value of 4 GiB matches what IPA does.
--overwrite-root-device-hints allows overwriting root device hints set
previously. It's disabled by default to allow more precise control over the
root device for some subset of nodes.
Note that for these arguments to work, this command should be run after
introspection. A separate documentation change will be posted for that.
Change-Id: I9f19554c5e7f34c8f63c1603c32b4d470fb12592
If the user is in the incorrect directory (one different from
where they originally deployed), the function to generate
passwords will create a new password file with random passwords.
This will then be sent to Heat and it will attempt to reconfigure
the passwords for all services (which currently isn't fully
supported and can leave users with a non-functioning overcloud).
The issue can be replicated with:
openstack overcloud deploy --templates
cd /tmp (or any other different directory)
openstack overcloud deploy --templates
This changes the behaviour to display an error if the password
file can't be found, but the Heat stack already exists.
Closes-Bug: #1541342
Change-Id: I2ce63c254c10d6382d626b2f5436019971a26952
This change will allow users (or ironic-inspector) to provide
several possible profiles for a node by setting capabilities like
XXX_profile (where XXX = compute, controller...).
Two new commands are added:
openstack overcloud profiles match
When no enough nodes with a given profile are found, this command
will inspect nodes with such capabilities and choose missing nodes
from them.
openstack overcloud profiles list
Lists all available and active nodes with their profiles and possible
profiles.
See the following thread for the full background:
http://lists.openstack.org/pipermail/openstack-dev/2015-November/078884.html
This changes refactores profile validation code in the deploy command to use
the same logic as commands above. It's worth noting that this change also
removes an incorrect assumption that a node can have multiple values
for the same capability. It also makes sure we only take active and available
nodes into account for all calculations.
Change-Id: I398cf2052b280eaf67e5755412c35fe9551c341f
This state was introduced in Liberty as a new state for freshly
enrolled node. Transition from it is done by the same verb "manage",
but now involves validation of power credentials.
This commit also
- makes utils.wait_for_provision_state raise appropriate exceptions on
error rather than returning False
- adds a utils.nodes_in_states function to list baremetal nodes in a
given set of states.
- adds logic to baremetal/fakes.py to track states of fake nodes; this
is much simpler and less brittle than precisely mocking the results of
API calls in exactly the order the library code makes them.
- consistently mocks bulk introspection tests at the client layer,
rather than at a variety of layers.
- adds tests for timeout and power-credential error during bulk
introspection.
- doesn't set nodes to "available" if they fail introspection or power
credentials
Change-Id: I4da61491f60f7ebd42ca1f8fe45c3d4df6e49887
Currently it only prints an error message, but exists with success.
This makes it impossible to use this command in any kind of automated
scripts or test it in our gate. This patch makes it raise an exception.
Change-Id: I150c87252a48a8062aa7ef04c7a52433dd5ee37d
It's generally recommended that base Exceptions not be raised,
because this makes it impossible to handle exceptions in a more
granular way. except Exception: will catch all exceptions, not
just the one you might care about.
This replaces most instances of this pattern in tripleoclient, with
the exception of the one in overcloud_image.py because I have a
separate change open that already fixes it.
Change-Id: I6fce306c4ffc57b4c52389be1feb583f5a400a64
Running openstack command with python-tripleoclient under
root user is not supported and should not be allowed. Added
check for user and exit if it is root (EUID=0) to openstack
undercloud install command.
Each command can be disabled for root by adding
utils.ensure_run_as_normal_user() into it's body.
Change-Id: I685c639e02790483d1607c7eac038f8b9b8dc99e
Closes-Bug: rhbz#1239088