This patch adds handling and checking of any instances of the workflow
tripleo.deployment.v1.config_download_deploy already in progress for the
current stack. It will prevent duplicate instances of the same workflow
being started and running at the same time.
It will allow for multiple instances of the workflow running at the same
time as long as they are for different stacks.
The problem we're solving here is that our operators using SSL + FQDN
based endpoints will have failures during the deployment because we
don't lookup the FQDN into IP addresses, needed later in the deployment
for proper binding.
This patch transforms undercloud_*_host parameters into IP addresses:
- We raise if lookup returns nothing.
- We raise if lookup returns more than one IP.
- We support both IPv4 and IPv6.
- We raise if the IP is a loopback.
- We raise if the returned IP is invalid.
* Introduce utils.is_valid_ip.
Return True if the IP is either v4 or v6. Return False otherwise.
* Introduce utils.is_loopback.
Return True if the given host is a loopback. Return False otherwise.
* Introduce utils.get_host_ips.
Returns a list of IPs for a host to lookup.
* Introduce utils.get_single_ip.
Translate an hostname or FQDN into an IP address if it is valid IP.
Return it unchanged if it is an IPv4 or IPv6 address.
If the host is not reachable, it'll raise an exception.
By default it excludes the loopbacks but it can be allowed by setting
allow_loopback = True.
* Use utils.get_single_ip to translate undercloud_admin_host and
undercloud_public_host to IP addresses.
The tripleoclient exceptions are supposed to have enough context,
logging their traceback just confuses the users. Note that with
--debug all exceptions will have a traceback anyway.
This change introduces a base class for tripleoclient exceptions.
After fixing a bug with https://review.openstack.org/#/c/603802/ the
return introduced there is bogus, we have to raise a proper exception
and handle it like timeouts, to get all the mistral allright.
When upgrading the heat-based undercloud a security
question is asked to stop the upgrade if a undercloud
backup was not performed. This patch handles the case
when a negative or wrong answer is provided,
raising a new UndercloudUpradeNotConfirmed exception
which is then captured and displays an informative
log notifying that the upgrade didn't take place.
Also, removing the format call when the undercloud
upgrades fails  as the called string message
doesn't need parametrization.
 - 03254c84f6/tripleoclient/v1/undercloud.py (L158)
Always log the traceback for unexpected exceptions. It's very difficult
to tell where the error came from without the traceback.
It seems this was the intent with
traceback.format_exception doesn't actually print/log anything, it
returns a list of strings, which are just lost if not actually
Instead use traceback.print_exc in the main handler to show the
This change is to invoke the workflows specified in
plan enviroment file. Workflows can be specified for
workflow_parameters parameter in plan-environment file.
This change parses the plan-environment file, and
sequentially executes all the workflows specified in
Implements: blueprint tripleo-derive-parameters
This command is to be used by an operator to run sosreport on
specific set of servers (or all) and retrieve log bundles that can
be used to debug the status of the cluster or troubleshoot issues.
The result of syncrhonously called Mistral actions wasn't being checked
to see if the action passed or failed. The result is now checked and if
the action has failed, an exception will be raised.
Multiple plan support has little value while unnecessarily complicating
the command UX. Being able to do one export at a time is enough. Operators
are encouraged to script the command if they wish to achieve the effect
of exporting multiple plans.
Partially implements: blueprint plan-export-command
At the moment the deploy command will take a number of steps, including
updating the plan and setting parameters in Mistral. Then when it gets
to the deploy, the workflow will fail. This change stops it earlier in
the process, which will be quicker and cleaner.
This patch adds a mechanism for setting a timeout when waiting for websocket
messages. It then adds it to workflow executions which are fairly predictable.
This means that they always take roughly the same length of time. Other
workflows like baremetal introspection can be much slower or quicker
depending on the the users environment.
Calls to the Mistral workflows to configure boot options and the root
Updates the baremetal registration workflows to use Mistral
instead of python.
Co-Authored-By: Dougal Matthews <firstname.lastname@example.org>
Co-Authored-By: Ryan Brady <email@example.com>
This change simplifies using Ironic root device hints for people who only need
to change the default strategy of selecting the default device.
E.g. we have a use case for selecting the first device instead of the smallest,
so that the root device does not change after upgrade Kilo -> Liberty.
Note that this feature does not replace per-node device hints, rather
complement them with a more global and easy to use setting.
This change introduces 3 new arguments to the command:
--root-device states how to find a root devices for a node.
If this argument is provided, the client will try to detect the root device
based on the stored introspection data. Possible values:
* smallest (the same thing as IPA does by default - the smallest device)
* largest (the opposite thing - pick the largest device)
* otherwise it's treated as a comma-separated list of possible device names,
e.g. sda,vda,hda rouchly matches the logic the Kilo ramdisk used.
The resulting device WWN or serial (whatever is available) is then recorded
in the root device hints (/properties/root_device) for a node.
--root-device-minimum-size minimum size of the considered devices
The default value of 4 GiB matches what IPA does.
--overwrite-root-device-hints allows overwriting root device hints set
previously. It's disabled by default to allow more precise control over the
root device for some subset of nodes.
Note that for these arguments to work, this command should be run after
introspection. A separate documentation change will be posted for that.
If the user is in the incorrect directory (one different from
where they originally deployed), the function to generate
passwords will create a new password file with random passwords.
This will then be sent to Heat and it will attempt to reconfigure
the passwords for all services (which currently isn't fully
supported and can leave users with a non-functioning overcloud).
The issue can be replicated with:
openstack overcloud deploy --templates
cd /tmp (or any other different directory)
openstack overcloud deploy --templates
This changes the behaviour to display an error if the password
file can't be found, but the Heat stack already exists.
This change will allow users (or ironic-inspector) to provide
several possible profiles for a node by setting capabilities like
XXX_profile (where XXX = compute, controller...).
Two new commands are added:
openstack overcloud profiles match
When no enough nodes with a given profile are found, this command
will inspect nodes with such capabilities and choose missing nodes
openstack overcloud profiles list
Lists all available and active nodes with their profiles and possible
See the following thread for the full background:
This changes refactores profile validation code in the deploy command to use
the same logic as commands above. It's worth noting that this change also
removes an incorrect assumption that a node can have multiple values
for the same capability. It also makes sure we only take active and available
nodes into account for all calculations.
This state was introduced in Liberty as a new state for freshly
enrolled node. Transition from it is done by the same verb "manage",
but now involves validation of power credentials.
This commit also
- makes utils.wait_for_provision_state raise appropriate exceptions on
error rather than returning False
- adds a utils.nodes_in_states function to list baremetal nodes in a
given set of states.
- adds logic to baremetal/fakes.py to track states of fake nodes; this
is much simpler and less brittle than precisely mocking the results of
API calls in exactly the order the library code makes them.
- consistently mocks bulk introspection tests at the client layer,
rather than at a variety of layers.
- adds tests for timeout and power-credential error during bulk
- doesn't set nodes to "available" if they fail introspection or power
Currently it only prints an error message, but exists with success.
This makes it impossible to use this command in any kind of automated
scripts or test it in our gate. This patch makes it raise an exception.
It's generally recommended that base Exceptions not be raised,
because this makes it impossible to handle exceptions in a more
granular way. except Exception: will catch all exceptions, not
just the one you might care about.
This replaces most instances of this pattern in tripleoclient, with
the exception of the one in overcloud_image.py because I have a
separate change open that already fixes it.
Running openstack command with python-tripleoclient under
root user is not supported and should not be allowed. Added
check for user and exit if it is root (EUID=0) to openstack
undercloud install command.
Each command can be disabled for root by adding
utils.ensure_run_as_normal_user() into it's body.