Moves the two blueprint/spec documents that existed in airship-in-a-bottle to the airship-specs. The implemented spec was not reformatted to the spec template. The other spec (in approved folder) was minimally updated to the spec template. Change-Id: I7468579e2fa3077ee1144e5294eba97d8e4ced05changes/45/587945/1
parent
6e0a18e7fa
commit
bfbfd56c81
@ -0,0 +1,620 @@
|
||||
..
|
||||
Copyright 2018 AT&T Intellectual Property.
|
||||
All Rights Reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
not use this file except in compliance with the License. You may obtain
|
||||
a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
License for the specific language governing permissions and limitations
|
||||
under the License.
|
||||
|
||||
.. index::
|
||||
single: Teardown node
|
||||
single: workflow;redeploy_server
|
||||
single: Drydock
|
||||
single: Promenade
|
||||
single: Shipyard
|
||||
|
||||
|
||||
.. _node-teardown:
|
||||
|
||||
=====================
|
||||
Airship Node Teardown
|
||||
=====================
|
||||
|
||||
Shipyard is the entrypoint for Airship actions, including the need to redeploy a
|
||||
server. The first part of redeploying a server is the graceful teardown of the
|
||||
software running on the server; specifically Kubernetes and etcd are of
|
||||
critical concern. It is the duty of Shipyard to orchestrate the teardown of the
|
||||
server, followed by steps to deploy the desired new configuration. This design
|
||||
covers only the first portion - node teardown
|
||||
|
||||
|
||||
Links
|
||||
=====
|
||||
|
||||
None
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
When redeploying a physical host (server) using the Airship Platform,
|
||||
it is necessary to trigger a sequence of steps to prevent undesired behaviors
|
||||
when the server is redeployed. This blueprint intends to document the
|
||||
interaction that must occur between Airship components to teardown a server.
|
||||
|
||||
Impacted components
|
||||
===================
|
||||
|
||||
Drydock
|
||||
Promenade
|
||||
Shipyard
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Shipyard node teardown Process
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
#. (Existing) Shipyard receives request to redeploy_server, specifying a target
|
||||
server.
|
||||
#. (Existing) Shipyard performs preflight, design reference lookup, and
|
||||
validation steps.
|
||||
#. (New) Shipyard invokes Promenade to decommission a node.
|
||||
#. (New) Shipyard invokes Drydock to destroy the node - setting a node
|
||||
filter to restrict to a single server.
|
||||
#. (New) Shipyard invokes Promenade to remove the node from the Kubernetes
|
||||
cluster.
|
||||
|
||||
Assumption:
|
||||
node_id is the hostname of the server, and is also the identifier that both
|
||||
Drydock and Promenade use to identify the appropriate parts - hosts and k8s
|
||||
nodes. This convention is set by the join script produced by promenade.
|
||||
|
||||
Drydock Destroy Node
|
||||
--------------------
|
||||
The API/interface for destroy node already exists. The implementation within
|
||||
Drydock needs to be developed. This interface will need to accept both the
|
||||
specified node_id and the design_id to retrieve from Deckhand.
|
||||
|
||||
Using the provided node_id (hardware node), and the design_id, Drydock will
|
||||
reset the hardware to a re-provisionable state.
|
||||
|
||||
By default, all local storage should be wiped (per datacenter policy for
|
||||
wiping before re-use).
|
||||
|
||||
An option to allow for only the OS disk to be wiped should be supported, such
|
||||
that other local storage is left intact, and could be remounted without data
|
||||
loss. e.g.: --preserve-local-storage
|
||||
|
||||
The target node should be shut down.
|
||||
|
||||
The target node should be removed from the provisioner (e.g. MaaS)
|
||||
|
||||
Responses
|
||||
~~~~~~~~~
|
||||
The responses from this functionality should follow the pattern set by prepare
|
||||
nodes, and other Drydock functionality. The Drydock status responses used for
|
||||
all async invocations will be utilized for this functionality.
|
||||
|
||||
Promenade Decommission Node
|
||||
---------------------------
|
||||
Performs steps that will result in the specified node being cleanly
|
||||
disassociated from Kubernetes, and ready for the server to be destroyed.
|
||||
Users of the decommission node API should be aware of the long timeout values
|
||||
that may occur while awaiting promenade to complete the appropriate steps.
|
||||
At this time, Promenade is a stateless service and doesn't use any database
|
||||
storage. As such, requests to Promenade are synchronous.
|
||||
|
||||
.. code:: json
|
||||
|
||||
POST /nodes/{node_id}/decommission
|
||||
|
||||
{
|
||||
rel : "design",
|
||||
href: "deckhand+https://{{deckhand_url}}/revisions/{{revision_id}}/rendered-documents",
|
||||
type: "application/x-yaml"
|
||||
}
|
||||
|
||||
Such that the design reference body is the design indicated when the
|
||||
redeploy_server action is invoked through Shipyard.
|
||||
|
||||
Query Parameters:
|
||||
|
||||
- drain-node-timeout: A whole number timeout in seconds to be used for the
|
||||
drain node step (default: none). In the case of no value being provided,
|
||||
the drain node step will use its default.
|
||||
- drain-node-grace-period: A whole number in seconds indicating the
|
||||
grace-period that will be provided to the drain node step. (default: none).
|
||||
If no value is specified, the drain node step will use its default.
|
||||
- clear-labels-timeout: A whole number timeout in seconds to be used for the
|
||||
clear labels step. (default: none). If no value is specified, clear labels
|
||||
will use its own default.
|
||||
- remove-etcd-timeout: A whole number timeout in seconds to be used for the
|
||||
remove etcd from nodes step. (default: none). If no value is specified,
|
||||
remove-etcd will use its own default.
|
||||
- etcd-ready-timeout: A whole number in seconds indicating how long the
|
||||
decommission node request should allow for etcd clusters to become stable
|
||||
(default: 600).
|
||||
|
||||
Process
|
||||
~~~~~~~
|
||||
Acting upon the node specified by the invocation and the design reference
|
||||
details:
|
||||
|
||||
#. Drain the Kubernetes node.
|
||||
#. Clear the Kubernetes labels on the node.
|
||||
#. Remove etcd nodes from their clusters (if impacted).
|
||||
- if the node being decommissioned contains etcd nodes, Promenade will
|
||||
attempt to gracefully have those nodes leave the etcd cluster.
|
||||
#. Ensure that etcd cluster(s) are in a stable state.
|
||||
- Polls for status every 30 seconds up to the etcd-ready-timeout, or the
|
||||
cluster meets the defined minimum functionality for the site.
|
||||
- A new document: promenade/EtcdClusters/v1 that will specify details about
|
||||
the etcd clusters deployed in the site, including: identifiers,
|
||||
credentials, and thresholds for minimum functionality.
|
||||
- This process should ignore the node being torn down from any calculation
|
||||
of health
|
||||
#. Shutdown the kubelet.
|
||||
- If this is not possible because the node is in a state of disarray such
|
||||
that it cannot schedule the daemonset to run, this step may fail, but
|
||||
should not hold up the process, as the Drydock dismantling of the node
|
||||
will shut the kubelet down.
|
||||
|
||||
Responses
|
||||
~~~~~~~~~
|
||||
All responses will be form of the Airship Status response.
|
||||
|
||||
- Success: Code: 200, reason: Success
|
||||
|
||||
Indicates that all steps are successful.
|
||||
|
||||
- Failure: Code: 404, reason: NotFound
|
||||
|
||||
Indicates that the target node is not discoverable by Promenade.
|
||||
|
||||
- Failure: Code: 500, reason: DisassociateStepFailure
|
||||
|
||||
The details section should detail the successes and failures further. Any
|
||||
4xx series errors from the individual steps would manifest as a 500 here.
|
||||
|
||||
Promenade Drain Node
|
||||
--------------------
|
||||
Drain the Kubernetes node for the target node. This will ensure that this node
|
||||
is no longer the target of any pod scheduling, and evicts or deletes the
|
||||
running pods. In the case of notes running DaemonSet manged pods, or pods
|
||||
that would prevent a drain from occurring, Promenade may be required to provide
|
||||
the `ignore-daemonsets` option or `force` option to attempt to drain the node
|
||||
as fully as possible.
|
||||
|
||||
By default, the drain node will utilize a grace period for pods of 1800
|
||||
seconds and a total timeout of 3600 seconds (1 hour). Clients of this
|
||||
functionality should be prepared for a long timeout.
|
||||
|
||||
.. code:: json
|
||||
|
||||
POST /nodes/{node_id}/drain
|
||||
|
||||
Query Paramters:
|
||||
|
||||
- timeout: a whole number in seconds (default = 3600). This value is the total
|
||||
timeout for the kubectl drain command.
|
||||
- grace-period: a whole number in seconds (default = 1800). This value is the
|
||||
grace period used by kubectl drain. Grace period must be less than timeout.
|
||||
|
||||
.. note::
|
||||
|
||||
This POST has no message body
|
||||
|
||||
Example command being used for drain (reference only)
|
||||
`kubectl drain --force --timeout 3600s --grace-period 1800 --ignore-daemonsets --delete-local-data n1`
|
||||
https://git.openstack.org/cgit/openstack/airship-promenade/tree/promenade/templates/roles/common/usr/local/bin/promenade-teardown
|
||||
|
||||
Responses
|
||||
~~~~~~~~~
|
||||
All responses will be form of the Airship Status response.
|
||||
|
||||
- Success: Code: 200, reason: Success
|
||||
|
||||
Indicates that the drain node has successfully concluded, and that no pods
|
||||
are currently running
|
||||
|
||||
- Failure: Status response, code: 400, reason: BadRequest
|
||||
|
||||
A request was made with parameters that cannot work - e.g. grace-period is
|
||||
set to a value larger than the timeout value.
|
||||
|
||||
- Failure: Status response, code: 404, reason: NotFound
|
||||
|
||||
The specified node is not discoverable by Promenade
|
||||
|
||||
- Failure: Status response, code: 500, reason: DrainNodeError
|
||||
|
||||
There was a processing exception raised while trying to drain a node. The
|
||||
details section should indicate the underlying cause if it can be
|
||||
determined.
|
||||
|
||||
Promenade Clear Labels
|
||||
----------------------
|
||||
Removes the labels that have been added to the target kubernetes node.
|
||||
|
||||
.. code:: json
|
||||
|
||||
POST /nodes/{node_id}/clear-labels
|
||||
|
||||
Query Parameters:
|
||||
|
||||
- timeout: A whole number in seconds allowed for the pods to settle/move
|
||||
following removal of labels. (Default = 1800)
|
||||
|
||||
.. note::
|
||||
|
||||
This POST has no message body
|
||||
|
||||
Responses
|
||||
~~~~~~~~~
|
||||
All responses will be form of the UCP Status response.
|
||||
|
||||
- Success: Code: 200, reason: Success
|
||||
|
||||
All labels have been removed from the specified Kubernetes node.
|
||||
|
||||
- Failure: Code: 404, reason: NotFound
|
||||
|
||||
The specified node is not discoverable by Promenade
|
||||
|
||||
- Failure: Code: 500, reason: ClearLabelsError
|
||||
|
||||
There was a failure to clear labels that prevented completion. The details
|
||||
section should provide more information about the cause of this failure.
|
||||
|
||||
Promenade Remove etcd Node
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Checks if the node specified contains any etcd nodes. If so, this API will
|
||||
trigger that etcd node to leave the associated etcd cluster::
|
||||
|
||||
POST /nodes/{node_id}/remove-etcd
|
||||
|
||||
{
|
||||
rel : "design",
|
||||
href: "deckhand+https://{{deckhand_url}}/revisions/{{revision_id}}/rendered-documents",
|
||||
type: "application/x-yaml"
|
||||
}
|
||||
|
||||
Query Parameters:
|
||||
|
||||
- timeout: A whole number in seconds allowed for the removal of etcd nodes
|
||||
from the targe node. (Default = 1800)
|
||||
|
||||
Responses
|
||||
~~~~~~~~~
|
||||
All responses will be form of the UCP Status response.
|
||||
|
||||
- Success: Code: 200, reason: Success
|
||||
|
||||
All etcd nodes have been removed from the specified node.
|
||||
|
||||
- Failure: Code: 404, reason: NotFound
|
||||
|
||||
The specified node is not discoverable by Promenade
|
||||
|
||||
- Failure: Code: 500, reason: RemoveEtcdError
|
||||
|
||||
There was a failure to remove etcd from the target node that prevented
|
||||
completion within the specified timeout, or that etcd prevented removal of
|
||||
the node because it would result in the cluster being broken. The details
|
||||
section should provide more information about the cause of this failure.
|
||||
|
||||
|
||||
Promenade Check etcd
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
Retrieves the current interpreted state of etcd.
|
||||
|
||||
GET /etcd-cluster-health-statuses?design_ref={the design ref}
|
||||
|
||||
Where the design_ref parameter is required for appropriate operation, and is in
|
||||
the same format as used for the join-scripts API.
|
||||
|
||||
Query Parameters:
|
||||
|
||||
- design_ref: (Required) the design reference to be used to discover etcd
|
||||
instances.
|
||||
|
||||
Responses
|
||||
~~~~~~~~~
|
||||
All responses will be form of the UCP Status response.
|
||||
|
||||
- Success: Code: 200, reason: Success
|
||||
|
||||
The status of each etcd in the site will be returned in the details section.
|
||||
Valid values for status are: Healthy, Unhealthy
|
||||
|
||||
https://github.com/att-comdev/ucp-integration/blob/master/docs/source/api-conventions.rst#status-responses
|
||||
|
||||
.. code:: json
|
||||
|
||||
{ "...": "... standard status response ...",
|
||||
"details": {
|
||||
"errorCount": {{n}},
|
||||
"messageList": [
|
||||
{ "message": "Healthy",
|
||||
"error": false,
|
||||
"kind": "HealthMessage",
|
||||
"name": "{{the name of the etcd service}}"
|
||||
},
|
||||
{ "message": "Unhealthy"
|
||||
"error": false,
|
||||
"kind": "HealthMessage",
|
||||
"name": "{{the name of the etcd service}}"
|
||||
},
|
||||
{ "message": "Unable to access Etcd"
|
||||
"error": true,
|
||||
"kind": "HealthMessage",
|
||||
"name": "{{the name of the etcd service}}"
|
||||
}
|
||||
]
|
||||
}
|
||||
...
|
||||
}
|
||||
|
||||
- Failure: Code: 400, reason: MissingDesignRef
|
||||
|
||||
Returned if the design_ref parameter is not specified
|
||||
|
||||
- Failure: Code: 404, reason: NotFound
|
||||
|
||||
Returned if the specified etcd could not be located
|
||||
|
||||
- Failure: Code: 500, reason: EtcdNotAccessible
|
||||
|
||||
Returned if the specified etcd responded with an invalid health response
|
||||
(Not just simply unhealthy - that's a 200).
|
||||
|
||||
|
||||
Promenade Shutdown Kubelet
|
||||
--------------------------
|
||||
Shuts down the kubelet on the specified node. This is accomplished by Promenade
|
||||
setting the label `promenade-decomission: enabled` on the node, which will
|
||||
trigger a newly-developed daemonset to run something like:
|
||||
`systemctl disable kubelet && systemctl stop kubelet`.
|
||||
This daemonset will effectively sit dormant until nodes have the appropriate
|
||||
label added, and then perform the kubelet teardown.
|
||||
|
||||
.. code:: json
|
||||
|
||||
POST /nodes/{node_id}/shutdown-kubelet
|
||||
|
||||
.. note::
|
||||
|
||||
This POST has no message body
|
||||
|
||||
Responses
|
||||
~~~~~~~~~
|
||||
All responses will be form of the UCP Status response.
|
||||
|
||||
- Success: Code: 200, reason: Success
|
||||
|
||||
The kubelet has been successfully shutdown
|
||||
|
||||
- Failure: Code: 404, reason: NotFound
|
||||
|
||||
The specified node is not discoverable by Promenade
|
||||
|
||||
- Failure: Code: 500, reason: ShutdownKubeletError
|
||||
|
||||
The specified node's kubelet fails to shutdown. The details section of the
|
||||
status response should contain reasonable information about the source of
|
||||
this failure
|
||||
|
||||
Promenade Delete Node from Cluster
|
||||
----------------------------------
|
||||
Updates the Kubernetes cluster, removing the specified node. Promenade should
|
||||
check that the node is drained/cordoned and has no labels other than
|
||||
`promenade-decomission: enabled`. In either of these cases, the API should
|
||||
respond with a 409 Conflict response.
|
||||
|
||||
.. code:: json
|
||||
|
||||
POST /nodes/{node_id}/remove-from-cluster
|
||||
|
||||
.. note::
|
||||
|
||||
This POST has no message body
|
||||
|
||||
Responses
|
||||
~~~~~~~~~
|
||||
All responses will be form of the UCP Status response.
|
||||
|
||||
- Success: Code: 200, reason: Success
|
||||
|
||||
The specified node has been removed from the Kubernetes cluster.
|
||||
|
||||
- Failure: Code: 404, reason: NotFound
|
||||
|
||||
The specified node is not discoverable by Promenade
|
||||
|
||||
- Failure: Code: 409, reason: Conflict
|
||||
|
||||
The specified node cannot be deleted due to checks that the node is
|
||||
drained/cordoned and has no labels (other than possibly
|
||||
`promenade-decomission: enabled`).
|
||||
|
||||
- Failure: Code: 500, reason: DeleteNodeError
|
||||
|
||||
The specified node cannot be removed from the cluster due to an error from
|
||||
Kubernetes. The details section of the status response should contain more
|
||||
information about the failure.
|
||||
|
||||
|
||||
Shipyard Tag Releases
|
||||
---------------------
|
||||
Shipyard will need to mark Deckhand revisions with tags when there are
|
||||
successful deploy_site or update_site actions to be able to determine the last
|
||||
known good design. This is related to issue 16 for Shipyard, which utilizes the
|
||||
same need.
|
||||
|
||||
.. note::
|
||||
|
||||
Repeated from https://github.com/att-comdev/shipyard/issues/16
|
||||
|
||||
When multiple configdocs commits have been done since the last deployment,
|
||||
there is no ready means to determine what's being done to the site. Shipyard
|
||||
should reject deploy site or update site requests that have had multiple
|
||||
commits since the last site true-up action. An option to override this guard
|
||||
should be allowed for the actions in the form of a parameter to the action.
|
||||
|
||||
The configdocs API should provide a way to see what's been changed since the
|
||||
last site true-up, not just the last commit of configdocs. This might be
|
||||
accommodated by new deckhand tags like the 'commit' tag, but for
|
||||
'site true-up' or similar applied by the deploy and update site commands.
|
||||
|
||||
The design for issue 16 includes the bare-minimum marking of Deckhand
|
||||
revisions. This design is as follows:
|
||||
|
||||
Scenario
|
||||
~~~~~~~~
|
||||
Multiple commits occur between site actions (deploy_site, update_site) - those
|
||||
actions that attempt to bring a site into compliance with a site design.
|
||||
When this occurs, the current system of being able to only see what has changed
|
||||
between committed and the the buffer versions (configdocs diff) is insufficient
|
||||
to be able to investigate what has changed since the last successful (or
|
||||
unsuccessful) site action.
|
||||
To accommodate this, Shipyard needs several enhancements.
|
||||
|
||||
Enhancements
|
||||
~~~~~~~~~~~~
|
||||
|
||||
#. Deckhand revision tags for site actions
|
||||
|
||||
Using the tagging facility provided by Deckhand, Shipyard will tag the end
|
||||
of site actions.
|
||||
Upon completing a site action successfully tag the revision being used with
|
||||
the tag site-action-success, and a body of dag_id:<dag_id>
|
||||
|
||||
Upon completion of a site action unsuccessfully, tag the revision being used
|
||||
with the tag site-action-failure, and a body of dag_id:<dag_id>
|
||||
|
||||
The completion tags should only be applied upon failure if the site action
|
||||
gets past document validation successfully (i.e. gets to the point where it
|
||||
can start making changes via the other UCP components)
|
||||
|
||||
This could result in a single revision having both site-action-success and
|
||||
site-action-failure if a later re-invocation of a site action is successful.
|
||||
|
||||
#. Check for intermediate committed revisions
|
||||
|
||||
Upon running a site action, before tagging the revision with the site action
|
||||
tag(s), the dag needs to check to see if there are committed revisions that
|
||||
do not have an associated site-action tag. If there are any committed
|
||||
revisions since the last site action other than the current revision being
|
||||
used (between them), then the action should not be allowed to proceed (stop
|
||||
before triggering validations). For the calculation of intermediate
|
||||
committed revisions, assume revision 0 if there are no revisions with a
|
||||
site-action tag (null case)
|
||||
|
||||
If the action is invoked with a parameter of
|
||||
allow-intermediate-commits=true, then this check should log that the
|
||||
intermediate committed revisions check is being skipped and not take any
|
||||
other action.
|
||||
|
||||
#. Support action parameter of allow-intermediate-commits=true|false
|
||||
|
||||
In the CLI for create action, the --param option supports adding parameters
|
||||
to actions. The parameters passed should be relayed by the CLI to the API
|
||||
and ultimately to the invocation of the DAG. The DAG as noted above will
|
||||
check for the presense of allow-intermediate-commits=true. This needs to be
|
||||
tested to work.
|
||||
|
||||
#. Shipyard needs to support retrieving configdocs and rendered documents for
|
||||
the last successful site action, and last site action (successful or not
|
||||
successful)
|
||||
|
||||
--successful-site-action
|
||||
--last-site-action
|
||||
These options would be mutually exclusive of --buffer or --committed
|
||||
|
||||
#. Shipyard diff (shipyard get configdocs)
|
||||
|
||||
Needs to support an option to do the diff of the buffer vs. the last
|
||||
successful site action and the last site action (succesful or not
|
||||
successful).
|
||||
|
||||
Currently there are no options to select which versions to diff (always
|
||||
buffer vs. committed)
|
||||
|
||||
support:
|
||||
--base-version=committed | successful-site-action | last-site-action (Default = committed)
|
||||
--diff-version=buffer | committed | successful-site-action | last-site-action (Default = buffer)
|
||||
|
||||
Equivalent query parameters need to be implemented in the API.
|
||||
|
||||
Because the implementation of this design will result in the tagging of
|
||||
successful site-actions, Shipyard will be able to determine the correct
|
||||
revision to use while attempting to teardown a node.
|
||||
|
||||
If the request to teardown a node indicates a revision that doesn't exist, the
|
||||
command to do so (e.g. redeploy_server) should not continue, but rather fail
|
||||
due to a missing precondition.
|
||||
|
||||
The invocation of the Promenade and Drydock steps in this design will utilize
|
||||
the appropriate tag based on the request (default is successful-site-action) to
|
||||
determine the revision of the Deckhand documents used as the design-ref.
|
||||
|
||||
Shipyard redeploy_server Action
|
||||
-------------------------------
|
||||
The redeploy_server action currently accepts a target node. Additional
|
||||
supported parameters are needed:
|
||||
|
||||
#. preserve-local-storage=true which will instruct Drydock to only wipe the
|
||||
OS drive, and any other local storage will not be wiped. This would allow
|
||||
for the drives to be remounted to the server upon re-provisioning. The
|
||||
default behavior is that local storage is not preserved.
|
||||
|
||||
#. target-revision=committed | successful-site-action | last-site-action
|
||||
This will indicate which revision of the design will be used as the
|
||||
reference for what should be re-provisioned after the teardown.
|
||||
The default is successful-site-action, which is the closest representation
|
||||
to the last-known-good state.
|
||||
|
||||
These should be accepted as parameters to the action API/CLI and modify the
|
||||
behavior of the redeploy_server DAG.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None. This change introduces no new security concerns outside of established
|
||||
patterns for RBAC controls around API endpoints.
|
||||
|
||||
Performance impact
|
||||
------------------
|
||||
|
||||
As this is an on-demand action, there is no expected performance impact to
|
||||
existing processes, although tearing down a host may result in temporary
|
||||
degraded service capacity in the case of needing to move workloads to different
|
||||
hosts, or a more simple case of reduced capacity.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
N/A
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
None at this time
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
None
|
@ -0,0 +1,569 @@
|
||||
..
|
||||
Copyright 2018 AT&T Intellectual Property.
|
||||
All Rights Reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
not use this file except in compliance with the License. You may obtain
|
||||
a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
License for the specific language governing permissions and limitations
|
||||
under the License.
|
||||
|
||||
.. index::
|
||||
single: Deployment grouping
|
||||
single: workflow
|
||||
single: Shipyard
|
||||
single: Drydock
|
||||
|
||||
.. _deployment-grouping-baremetal:
|
||||
|
||||
=======================================
|
||||
Deployment Grouping for Baremetal Nodes
|
||||
=======================================
|
||||
One of the primary functionalities of the Undercloud Platform is the deployment
|
||||
of baremetal nodes as part of site deployment and upgrade. This blueprint aims
|
||||
to define how deployment strategies can be applied to the workflow during these
|
||||
actions.
|
||||
|
||||
.. note::
|
||||
|
||||
This document has been moved from the airship-in-a-bottle project, and is
|
||||
previously implemented. The format of this document diverges from the
|
||||
standard template for airship-specs.
|
||||
|
||||
Overview
|
||||
--------
|
||||
When Shipyard is invoked for a deploy_site or update_site action, there are
|
||||
three primary stages:
|
||||
|
||||
1. Preparation and Validation
|
||||
2. Baremetal and Network Deployment
|
||||
3. Software Deployment
|
||||
|
||||
During the Baremetal and Network Deployment stage, the deploy_site or
|
||||
update_site workflow (and perhaps other workflows in the future) invokes
|
||||
Drydock to verify the site, prepare the site, prepare the nodes, and deploy the
|
||||
nodes. Each of these steps is described in the `Drydock Orchestrator Readme`_
|
||||
|
||||
.. _Drydock Orchestrator Readme: https://git.openstack.org/cgit/openstack/airship-drydock/plain/drydock_provisioner/orchestrator/readme.md
|
||||
|
||||
The prepare nodes and deploy nodes steps each involve intensive and potentially
|
||||
time consuming operations on the target nodes, orchestrated by Drydock and
|
||||
MAAS. These steps need to be approached and managed such that grouping,
|
||||
ordering, and criticality of success of nodes can be managed in support of
|
||||
fault tolerant site deployments and updates.
|
||||
|
||||
For the purposes of this document `phase of deployment` refer to the prepare
|
||||
nodes and deploy nodes steps of the Baremetal and Network deployment.
|
||||
|
||||
Some factors that advise this solution:
|
||||
|
||||
1. Limits to the amount of parallelization that can occur due to a centralized
|
||||
MAAS system.
|
||||
2. Faults in the hardware, preventing operational nodes.
|
||||
3. Miswiring or configuration of network hardware.
|
||||
4. Incorrect site design causing a mismatch against the hardware.
|
||||
5. Criticality of particular nodes to the realization of the site design.
|
||||
6. Desired configurability within the framework of the UCP declarative site
|
||||
design.
|
||||
7. Improved visibility into the current state of node deployment.
|
||||
8. A desire to begin the deployment of nodes before the finish of the
|
||||
preparation of nodes -- i.e. start deploying nodes as soon as they are ready
|
||||
to be deployed. Note: This design will not achieve new forms of
|
||||
task parallelization within Drydock; this is recognized as a desired
|
||||
functionality.
|
||||
|
||||
Solution
|
||||
--------
|
||||
Updates supporting this solution will require changes to Shipyard for changed
|
||||
workflows and Drydock for the desired node targeting, and for retrieval of
|
||||
diagnostic and result information.
|
||||
|
||||
.. index::
|
||||
single: Shipyard Documents; DeploymentStrategy
|
||||
|
||||
Deployment Strategy Document (Shipyard)
|
||||
---------------------------------------
|
||||
To accommodate the needed changes, this design introduces a new
|
||||
DeploymentStrategy document into the site design to be read and utilized
|
||||
by the workflows for update_site and deploy_site.
|
||||
|
||||
Groups
|
||||
~~~~~~
|
||||
Groups are named sets of nodes that will be deployed together. The fields of a
|
||||
group are:
|
||||
|
||||
name
|
||||
Required. The identifying name of the group.
|
||||
|
||||
critical
|
||||
Required. Indicates if this group is required to continue to additional
|
||||
phases of deployment.
|
||||
|
||||
depends_on
|
||||
Required, may be empty list. Group names that must be successful before this
|
||||
group can be processed.
|
||||
|
||||
selectors
|
||||
Required, may be empty list. A list of identifying information to indicate
|
||||
the nodes that are members of this group.
|
||||
|
||||
success_criteria
|
||||
Optional. Criteria that must evaluate to be true before a group is considered
|
||||
successfully complete with a phase of deployment.
|
||||
|
||||
Criticality
|
||||
'''''''''''
|
||||
- Field: critical
|
||||
- Valid values: true | false
|
||||
|
||||
Each group is required to indicate true or false for the `critical` field.
|
||||
This drives the behavior after the deployment of baremetal nodes. If any
|
||||
groups that are marked as `critical: true` fail to meet that group's success
|
||||
criteria, the workflow should halt after the deployment of baremetal nodes. A
|
||||
group that cannot be processed due to a parent dependency failing will be
|
||||
considered failed, regardless of the success criteria.
|
||||
|
||||
Dependencies
|
||||
''''''''''''
|
||||
- Field: depends_on
|
||||
- Valid values: [] or a list of group names
|
||||
|
||||
Each group specifies a list of depends_on groups, or an empty list. All
|
||||
identified groups must complete successfully for the phase of deployment before
|
||||
the current group is allowed to be processed by the current phase.
|
||||
|
||||
- A failure (based on success criteria) of a group prevents any groups
|
||||
dependent upon the failed group from being attempted.
|
||||
- Circular dependencies will be rejected as invalid during document validation.
|
||||
- There is no guarantee of ordering among groups that have their dependencies
|
||||
met. Any group that is ready for deployment based on declared dependencies
|
||||
will execute. Execution of groups is serialized - two groups will not deploy
|
||||
at the same time.
|
||||
|
||||
Selectors
|
||||
'''''''''
|
||||
- Field: selectors
|
||||
- Valid values: [] or a list of selectors
|
||||
|
||||
The list of selectors indicate the nodes that will be included in a group.
|
||||
Each selector has four available filtering values: node_names, node_tags,
|
||||
node_labels, and rack_names. Each selector is an intersection of this
|
||||
critera, while the list of selectors is a union of the individual selectors.
|
||||
|
||||
- Omitting a criterion from a selector, or using empty list means that criterion
|
||||
is ignored.
|
||||
- Having a completely empty list of selectors, or a selector that has no
|
||||
criteria specified indicates ALL nodes.
|
||||
- A collection of selectors that results in no nodes being identified will be
|
||||
processed as if 100% of nodes successfully deployed (avoiding division by
|
||||
zero), but would fail the minimum or maximum nodes criteria (still counts as
|
||||
0 nodes)
|
||||
- There is no validation against the same node being in multiple groups,
|
||||
however the workflow will not resubmit nodes that have already completed or
|
||||
failed in this deployment to Drydock twice, since it keeps track of each node
|
||||
uniquely. The success or failure of those nodes excluded from submission to
|
||||
Drydock will still be used for the success criteria calculation.
|
||||
|
||||
E.g.::
|
||||
|
||||
selectors:
|
||||
- node_names:
|
||||
- node01
|
||||
- node02
|
||||
rack_names:
|
||||
- rack01
|
||||
node_tags:
|
||||
- control
|
||||
- node_names:
|
||||
- node04
|
||||
node_labels:
|
||||
- ucp_control_plane: enabled
|
||||
|
||||
Will indicate (not really SQL, just for illustration)::
|
||||
|
||||
SELECT nodes
|
||||
WHERE node_name in ('node01', 'node02')
|
||||
AND rack_name in ('rack01')
|
||||
AND node_tags in ('control')
|
||||
UNION
|
||||
SELECT nodes
|
||||
WHERE node_name in ('node04')
|
||||
AND node_label in ('ucp_control_plane: enabled')
|
||||
|
||||
Success Criteria
|
||||
''''''''''''''''
|
||||
- Field: success_criteria
|
||||
- Valid values: for possible values, see below
|
||||
|
||||
Each group optionally contains success criteria which is used to indicate if
|
||||
the deployment of that group is successful. The values that may be specified:
|
||||
|
||||
percent_successful_nodes
|
||||
The calculated success rate of nodes completing the deployment phase.
|
||||
|
||||
E.g.: 75 would mean that 3 of 4 nodes must complete the phase successfully.
|
||||
|
||||
This is useful for groups that have larger numbers of nodes, and do not
|
||||
have critical minimums or are not sensitive to an arbitrary number of nodes
|
||||
not working.
|
||||
|
||||
minimum_successful_nodes
|
||||
An integer indicating how many nodes must complete the phase to be considered
|
||||
successful.
|
||||
|
||||
maximum_failed_nodes
|
||||
An integer indicating a number of nodes that are allowed to have failed the
|
||||
deployment phase and still consider that group successful.
|
||||
|
||||
When no criteria are specified, it means that no checks are done - processing
|
||||
continues as if nothing is wrong.
|
||||
|
||||
When more than one criterion is specified, each is evaluated separately - if
|
||||
any fail, the group is considered failed.
|
||||
|
||||
|
||||
Example Deployment Strategy Document
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
This example shows a deployment strategy with 5 groups: control-nodes,
|
||||
compute-nodes-1, compute-nodes-2, monitoring-nodes, and ntp-node.
|
||||
|
||||
::
|
||||
|
||||
---
|
||||
schema: shipyard/DeploymentStrategy/v1
|
||||
metadata:
|
||||
schema: metadata/Document/v1
|
||||
name: deployment-strategy
|
||||
layeringDefinition:
|
||||
abstract: false
|
||||
layer: global
|
||||
storagePolicy: cleartext
|
||||
data:
|
||||
groups:
|
||||
- name: control-nodes
|
||||
critical: true
|
||||
depends_on:
|
||||
- ntp-node
|
||||
selectors:
|
||||
- node_names: []
|
||||
node_labels: []
|
||||
node_tags:
|
||||
- control
|
||||
rack_names:
|
||||
- rack03
|
||||
success_criteria:
|
||||
percent_successful_nodes: 90
|
||||
minimum_successful_nodes: 3
|
||||
maximum_failed_nodes: 1
|
||||
- name: compute-nodes-1
|
||||
critical: false
|
||||
depends_on:
|
||||
- control-nodes
|
||||
selectors:
|
||||
- node_names: []
|
||||
node_labels: []
|
||||
rack_names:
|
||||
- rack01
|
||||
node_tags:
|
||||
- compute
|
||||
success_criteria:
|
||||
percent_successful_nodes: 50
|
||||
- name: compute-nodes-2
|
||||
critical: false
|
||||
depends_on:
|
||||
- control-nodes
|
||||
selectors:
|
||||
- node_names: []
|
||||
node_labels: []
|
||||
rack_names:
|
||||
- rack02
|
||||
node_tags:
|
||||
- compute
|
||||
success_criteria:
|
||||
percent_successful_nodes: 50
|
||||
- name: monitoring-nodes
|
||||
critical: false
|
||||
depends_on: []
|
||||
selectors:
|
||||
- node_names: []
|
||||
node_labels: []
|
||||
node_tags:
|
||||
- monitoring
|
||||
rack_names:
|
||||
- rack03
|
||||
- rack02
|
||||
- rack01
|
||||
- name: ntp-node
|
||||
critical: true
|
||||
depends_on: []
|
||||
selectors:
|
||||
- node_names:
|
||||
- ntp01
|
||||
node_labels: []
|
||||
node_tags: []
|
||||
rack_names: []
|
||||
success_criteria:
|
||||
minimum_successful_nodes: 1
|
||||
|
||||
The ordering of groups, as defined by the dependencies (``depends-on``
|
||||
fields)::
|
||||
|
||||
__________ __________________
|
||||
| ntp-node | | monitoring-nodes |
|
||||
---------- ------------------
|
||||
|
|
||||
____V__________
|
||||
| control-nodes |
|
||||
---------------
|
||||
|_________________________
|
||||
| |
|
||||
______V__________ ______V__________
|
||||
| compute-nodes-1 | | compute-nodes-2 |
|
||||
----------------- -----------------
|
||||
|
||||
Given this, the order of execution could be:
|
||||
|
||||
- ntp-node > monitoring-nodes > control-nodes > compute-nodes-1 > compute-nodes-2
|
||||
- ntp-node > control-nodes > compute-nodes-2 > compute-nodes-1 > monitoring-nodes
|
||||
- monitoring-nodes > ntp-node > control-nodes > compute-nodes-1 > compute-nodes-2
|
||||
- and many more ... the only guarantee is that ntp-node will run some time
|
||||
before control-nodes, which will run sometime before both of the
|
||||
compute-nodes. Monitoring-nodes can run at any time.
|
||||
|
||||
Also of note are the various combinations of selectors and the varied use of
|
||||
success criteria.
|
||||
|
||||
Deployment Configuration Document (Shipyard)
|
||||
--------------------------------------------
|
||||
The existing deployment-configuration document that is used by the workflows
|
||||
will also be modified to use the existing deployment_strategy field to provide
|
||||
the name of the deployment-straegy document that will be used.
|
||||
|
||||
The default value for the name of the DeploymentStrategy document will be
|
||||
``deployment-strategy``.
|
||||
|
||||
Drydock Changes
|
||||
---------------
|
||||
|
||||
API and CLI
|
||||
~~~~~~~~~~~
|
||||
- A new API needs to be provided that accepts a node filter (i.e. selector,
|
||||
above) and returns a list of node names that result from analysis of the
|
||||
design. Input to this API will also need to include a design reference.
|
||||
|
||||
- Drydock needs to provide a "tree" output of tasks rooted at the requested
|
||||
parent task. This will provide the needed success/failure status for nodes
|
||||
that have been prepared/deployed.
|
||||
|
||||
Documentation
|
||||
~~~~~~~~~~~~~
|
||||
Drydock documentation will be updated to match the introduction of new APIs
|
||||
|
||||
|
||||
Shipyard Changes
|
||||
----------------
|
||||
|
||||
API and CLI
|
||||
~~~~~~~~~~~
|
||||
- The commit configdocs api will need to be enhanced to look up the
|
||||
DeploymentStrategy by using the DeploymentConfiguration.
|
||||
- The DeploymentStrategy document will need to be validated to ensure there are
|
||||
no circular dependencies in the groups' declared dependencies (perhaps
|
||||
NetworkX_).
|
||||
- A new API endpoint (and matching CLI) is desired to retrieve the status of
|
||||
nodes as known to Drydock/MAAS and their MAAS status. The existing node list
|
||||
API in Drydock provides a JSON output that can be utilized for this purpose.
|
||||
|
||||
Workflow
|
||||
~~~~~~~~
|
||||
The deploy_site and update_site workflows will be modified to utilize the
|
||||
DeploymentStrategy.
|
||||
|
||||
- The deployment configuration step will be enhanced to also read the
|
||||
deployment strategy and pass the information on a new xcom for use by the
|
||||
baremetal nodes step (see below)
|
||||
- The prepare nodes and deploy nodes steps will be combined to perform both as
|
||||
part of the resolution of an overall ``baremetal nodes`` step.
|
||||
The baremetal nodes step will introduce functionality that reads in the
|
||||
deployment strategy (from the prior xcom), and can orchestrate the calls to
|
||||
Drydock to enact the grouping, ordering and and success evaluation.
|
||||
Note that Drydock will serialize tasks; there is no parallelization of
|
||||
prepare/deploy at this time.
|
||||
|
||||
Needed Functionality
|
||||
''''''''''''''''''''
|
||||
|
||||
- function to formulate the ordered groups based on dependencies (perhaps
|
||||
NetworkX_)
|
||||
- function to evaluate success/failure against the success criteria for a group
|
||||
based on the result list of succeeded or failed nodes.
|
||||
- function to mark groups as success or failure (including failed due to
|
||||
dependency failure), as well as keep track of the (if any) successful and
|
||||
failed nodes.
|
||||
- function to get a group that is ready to execute, or 'Done' when all groups
|
||||
are either complete or failed.
|
||||
- function to formulate the node filter for Drydock based on a group's
|
||||
selectors
|
||||
- function to orchestrate processing groups, moving to the next group (or being
|
||||
done) when a prior group completes or fails.
|
||||
- function to summarize the success/failed nodes for a group (primarily for
|
||||
reporting to the logs at this time).
|
||||
|
||||
Process
|
||||
'''''''
|
||||
The baremetal nodes step (preparation and deployment of nodes) will proceed as
|
||||
follows:
|
||||
|
||||
1. Each group's selector will be sent to Drydock to determine the list of
|
||||
nodes that are a part of that group.
|
||||
|
||||
- An overall status will be kept for each unique node (not started |
|
||||
prepared | success | failure).
|
||||
- When sending a task to Drydock for processing, the nodes associated with
|
||||
that group will be sent as a simple `node_name` node filter. This will
|
||||
allow for this list to exclude nodes that have a status that is not
|
||||
congruent for the task being performed.
|
||||
|
||||
- prepare nodes valid status: not started
|
||||
- deploy nodes valid status: prepared
|
||||
|
||||
2. In a processing loop, groups that are ready to be processed based on their
|
||||
dependencies (and the success criteria of groups they are dependent upon)
|
||||
will be selected for processing until there are no more groups that can be
|
||||
processed. The processing will consist of preparing and then deploying the
|
||||
group.
|
||||
|
||||
- The selected group will be prepared and then deployed before selecting
|
||||
another group for processing.
|
||||
- Any nodes that failed as part of that group will be excluded from
|
||||
subsequent deployment or preparation of that node for this deployment.
|
||||
|
||||
- Excluding nodes that are already processed addresses groups that have
|
||||
overlapping lists of nodes due to the group's selectors, and prevents
|
||||
sending them to Drydock for re-processing.
|
||||
- Evaluation of the success criteria will use the full set of nodes
|
||||
identified by the selector. This means that if a node was previously
|
||||
successfully deployed, that same node will count as "successful" when
|
||||
evaluating the success criteria.
|
||||
|
||||
- The success criteria will be evaluated after the group's prepare step and
|
||||
the deploy step. A failure to meet the success criteria in a prepare step
|
||||
will cause the deploy step for that group to be skipped (and marked as
|
||||
failed).
|
||||
- Any nodes that fail during the prepare step, will not be used in the
|
||||
corresponding deploy step.
|
||||
- Upon completion (success, partial success, or failure) of a prepare step,
|
||||
the nodes that were sent for preparation will be marked in the unique list
|
||||
of nodes (above) with their appropriate status: prepared or failure
|
||||
- Upon completion of a group's deployment step, the nodes status will be
|
||||
updated to their current status: success or failure.
|
||||
|
||||
4. Before the end of the baremetal nodes step, following all eligible group
|
||||
processing, a report will be logged to indicate the success/failure of
|
||||
groups and the status of the individual nodes. Note that it is possible for
|
||||
individual nodes to be left in `not started` state if they were only part of
|
||||
groups that were never allowed to process due to dependencies and success
|
||||
criteria.
|
||||
|
||||
5. At the end of the baremetal nodes step, if any nodes that have failed
|
||||
due to timeout, dependency failure, or success criteria failure and are
|
||||
marked as critical will trigger an Airflow Exception, resulting in a failed
|
||||
deployment.
|
||||
|
||||
Notes:
|
||||
|
||||
- The timeout values specified for the prepare nodes and deploy nodes steps
|
||||
will be used to put bounds on the individual calls to Drydock. A failure
|
||||
based on these values will be treated as a failure for the group; we need to
|
||||
be vigilant on if this will lead to indeterminate states for nodes that mess
|
||||
with further processing. (e.g. Timed out, but the requested work still
|
||||
continued to completion)
|
||||
|
||||
Example Processing
|
||||
''''''''''''''''''
|
||||
Using the defined deployment strategy in the above example, the following is
|
||||
an example of how it may process::
|
||||
|
||||
Start
|
||||
|
|
||||
| prepare ntp-node <SUCCESS>
|
||||
| deploy ntp-node <SUCCESS>
|
||||
V
|
||||
| prepare control-nodes <SUCCESS>
|
||||
| deploy control-nodes <SUCCESS>
|
||||
V
|
||||
| prepare monitoring-nodes <SUCCESS>
|
||||
| deploy monitoring-nodes <SUCCESS>
|
||||
V
|
||||
| prepare compute-nodes-2 <SUCCESS>
|
||||
| deploy compute-nodes-2 <SUCCESS>
|
||||
V
|
||||
| prepare compute-nodes-1 <SUCCESS>
|
||||
| deploy compute-nodes-1 <SUCCESS>
|
||||
|
|
||||
Finish (success)
|
||||
|
||||
If there were a failure in preparing the ntp-node, the following would be the
|
||||
result::
|
||||
|
||||
Start
|
||||
|
|
||||
| prepare ntp-node <FAILED>
|
||||
| deploy ntp-node <FAILED, due to prepare failure>
|
||||
V
|
||||
| prepare control-nodes <FAILED, due to dependency>
|
||||
| deploy control-nodes <FAILED, due to dependency>
|
||||
V
|
||||
| prepare monitoring-nodes <SUCCESS>
|
||||
| deploy monitoring-nodes <SUCCESS>
|
||||
V
|
||||
| prepare compute-nodes-2 <FAILED, due to dependency>
|
||||
| deploy compute-nodes-2 <FAILED, due to dependency>
|
||||
V
|
||||
| prepare compute-nodes-1 <FAILED, due to dependency>
|
||||
| deploy compute-nodes-1 <FAILED, due to dependency>
|
||||
|
|
||||
Finish (failed due to critical group failed)
|
||||
|
||||
If a failure occurred during the deploy of compute-nodes-2, the following would
|
||||
result::
|
||||
|
||||
Start
|
||||
|
|
||||
| prepare ntp-node <SUCCESS>
|
||||
| deploy ntp-node <SUCCESS>
|
||||
V
|
||||
| prepare control-nodes <SUCCESS>
|
||||
| deploy control-nodes <SUCCESS>
|
||||
V
|
||||
| prepare monitoring-nodes <SUCCESS>
|
||||
| deploy monitoring-nodes <SUCCESS>
|
||||
V
|
||||
| prepare compute-nodes-2 <SUCCESS>
|
||||
| deploy compute-nodes-2 <FAILED>
|
||||
V
|
||||
| prepare compute-nodes-1 <SUCCESS>
|
||||
| deploy compute-nodes-1 <SUCCESS>
|
||||
|
|
||||
Finish (success with some nodes/groups failed)
|
||||
|
||||
Schemas
|
||||
~~~~~~~
|
||||
A new schema will need to be provided by Shipyard to validate the
|
||||
DeploymentStrategy document.
|
||||
|
||||
Documentation
|
||||
~~~~~~~~~~~~~
|
||||
The Shipyard action documentation will need to include details defining the
|
||||
DeploymentStrategy document (mostly as defined here), as well as the update to
|
||||
the DeploymentConfiguration document to contain the name of the
|
||||
DeploymentStrategy document.
|
||||
|
||||
|
||||
.. _NetworkX: https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.algorithms.dag.topological_sort.html
|
Loading…
Reference in new issue