diff --git a/specs/approved/workflow_node-teardown.rst b/specs/approved/workflow_node-teardown.rst new file mode 100644 index 0000000..21f1779 --- /dev/null +++ b/specs/approved/workflow_node-teardown.rst @@ -0,0 +1,620 @@ +.. + Copyright 2018 AT&T Intellectual Property. + All Rights Reserved. + + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + +.. index:: + single: Teardown node + single: workflow;redeploy_server + single: Drydock + single: Promenade + single: Shipyard + + +.. _node-teardown: + +===================== +Airship Node Teardown +===================== + +Shipyard is the entrypoint for Airship actions, including the need to redeploy a +server. The first part of redeploying a server is the graceful teardown of the +software running on the server; specifically Kubernetes and etcd are of +critical concern. It is the duty of Shipyard to orchestrate the teardown of the +server, followed by steps to deploy the desired new configuration. This design +covers only the first portion - node teardown + + +Links +===== + +None + +Problem description +=================== + +When redeploying a physical host (server) using the Airship Platform, +it is necessary to trigger a sequence of steps to prevent undesired behaviors +when the server is redeployed. This blueprint intends to document the +interaction that must occur between Airship components to teardown a server. + +Impacted components +=================== + +Drydock +Promenade +Shipyard + +Proposed change +=============== + +Shipyard node teardown Process +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +#. (Existing) Shipyard receives request to redeploy_server, specifying a target + server. +#. (Existing) Shipyard performs preflight, design reference lookup, and + validation steps. +#. (New) Shipyard invokes Promenade to decommission a node. +#. (New) Shipyard invokes Drydock to destroy the node - setting a node + filter to restrict to a single server. +#. (New) Shipyard invokes Promenade to remove the node from the Kubernetes + cluster. + +Assumption: +node_id is the hostname of the server, and is also the identifier that both +Drydock and Promenade use to identify the appropriate parts - hosts and k8s +nodes. This convention is set by the join script produced by promenade. + +Drydock Destroy Node +-------------------- +The API/interface for destroy node already exists. The implementation within +Drydock needs to be developed. This interface will need to accept both the +specified node_id and the design_id to retrieve from Deckhand. + +Using the provided node_id (hardware node), and the design_id, Drydock will +reset the hardware to a re-provisionable state. + +By default, all local storage should be wiped (per datacenter policy for +wiping before re-use). + +An option to allow for only the OS disk to be wiped should be supported, such +that other local storage is left intact, and could be remounted without data +loss. e.g.: --preserve-local-storage + +The target node should be shut down. + +The target node should be removed from the provisioner (e.g. MaaS) + +Responses +~~~~~~~~~ +The responses from this functionality should follow the pattern set by prepare +nodes, and other Drydock functionality. The Drydock status responses used for +all async invocations will be utilized for this functionality. + +Promenade Decommission Node +--------------------------- +Performs steps that will result in the specified node being cleanly +disassociated from Kubernetes, and ready for the server to be destroyed. +Users of the decommission node API should be aware of the long timeout values +that may occur while awaiting promenade to complete the appropriate steps. +At this time, Promenade is a stateless service and doesn't use any database +storage. As such, requests to Promenade are synchronous. + +.. code:: json + + POST /nodes/{node_id}/decommission + + { + rel : "design", + href: "deckhand+https://{{deckhand_url}}/revisions/{{revision_id}}/rendered-documents", + type: "application/x-yaml" + } + +Such that the design reference body is the design indicated when the +redeploy_server action is invoked through Shipyard. + +Query Parameters: + +- drain-node-timeout: A whole number timeout in seconds to be used for the + drain node step (default: none). In the case of no value being provided, + the drain node step will use its default. +- drain-node-grace-period: A whole number in seconds indicating the + grace-period that will be provided to the drain node step. (default: none). + If no value is specified, the drain node step will use its default. +- clear-labels-timeout: A whole number timeout in seconds to be used for the + clear labels step. (default: none). If no value is specified, clear labels + will use its own default. +- remove-etcd-timeout: A whole number timeout in seconds to be used for the + remove etcd from nodes step. (default: none). If no value is specified, + remove-etcd will use its own default. +- etcd-ready-timeout: A whole number in seconds indicating how long the + decommission node request should allow for etcd clusters to become stable + (default: 600). + +Process +~~~~~~~ +Acting upon the node specified by the invocation and the design reference +details: + +#. Drain the Kubernetes node. +#. Clear the Kubernetes labels on the node. +#. Remove etcd nodes from their clusters (if impacted). + - if the node being decommissioned contains etcd nodes, Promenade will + attempt to gracefully have those nodes leave the etcd cluster. +#. Ensure that etcd cluster(s) are in a stable state. + - Polls for status every 30 seconds up to the etcd-ready-timeout, or the + cluster meets the defined minimum functionality for the site. + - A new document: promenade/EtcdClusters/v1 that will specify details about + the etcd clusters deployed in the site, including: identifiers, + credentials, and thresholds for minimum functionality. + - This process should ignore the node being torn down from any calculation + of health +#. Shutdown the kubelet. + - If this is not possible because the node is in a state of disarray such + that it cannot schedule the daemonset to run, this step may fail, but + should not hold up the process, as the Drydock dismantling of the node + will shut the kubelet down. + +Responses +~~~~~~~~~ +All responses will be form of the Airship Status response. + +- Success: Code: 200, reason: Success + + Indicates that all steps are successful. + +- Failure: Code: 404, reason: NotFound + + Indicates that the target node is not discoverable by Promenade. + +- Failure: Code: 500, reason: DisassociateStepFailure + + The details section should detail the successes and failures further. Any + 4xx series errors from the individual steps would manifest as a 500 here. + +Promenade Drain Node +-------------------- +Drain the Kubernetes node for the target node. This will ensure that this node +is no longer the target of any pod scheduling, and evicts or deletes the +running pods. In the case of notes running DaemonSet manged pods, or pods +that would prevent a drain from occurring, Promenade may be required to provide +the `ignore-daemonsets` option or `force` option to attempt to drain the node +as fully as possible. + +By default, the drain node will utilize a grace period for pods of 1800 +seconds and a total timeout of 3600 seconds (1 hour). Clients of this +functionality should be prepared for a long timeout. + +.. code:: json + + POST /nodes/{node_id}/drain + +Query Paramters: + +- timeout: a whole number in seconds (default = 3600). This value is the total + timeout for the kubectl drain command. +- grace-period: a whole number in seconds (default = 1800). This value is the + grace period used by kubectl drain. Grace period must be less than timeout. + +.. note:: + + This POST has no message body + +Example command being used for drain (reference only) +`kubectl drain --force --timeout 3600s --grace-period 1800 --ignore-daemonsets --delete-local-data n1` +https://git.openstack.org/cgit/openstack/airship-promenade/tree/promenade/templates/roles/common/usr/local/bin/promenade-teardown + +Responses +~~~~~~~~~ +All responses will be form of the Airship Status response. + +- Success: Code: 200, reason: Success + + Indicates that the drain node has successfully concluded, and that no pods + are currently running + +- Failure: Status response, code: 400, reason: BadRequest + + A request was made with parameters that cannot work - e.g. grace-period is + set to a value larger than the timeout value. + +- Failure: Status response, code: 404, reason: NotFound + + The specified node is not discoverable by Promenade + +- Failure: Status response, code: 500, reason: DrainNodeError + + There was a processing exception raised while trying to drain a node. The + details section should indicate the underlying cause if it can be + determined. + +Promenade Clear Labels +---------------------- +Removes the labels that have been added to the target kubernetes node. + +.. code:: json + + POST /nodes/{node_id}/clear-labels + +Query Parameters: + +- timeout: A whole number in seconds allowed for the pods to settle/move + following removal of labels. (Default = 1800) + +.. note:: + + This POST has no message body + +Responses +~~~~~~~~~ +All responses will be form of the UCP Status response. + +- Success: Code: 200, reason: Success + + All labels have been removed from the specified Kubernetes node. + +- Failure: Code: 404, reason: NotFound + + The specified node is not discoverable by Promenade + +- Failure: Code: 500, reason: ClearLabelsError + + There was a failure to clear labels that prevented completion. The details + section should provide more information about the cause of this failure. + +Promenade Remove etcd Node +~~~~~~~~~~~~~~~~~~~~~~~~~~ +Checks if the node specified contains any etcd nodes. If so, this API will +trigger that etcd node to leave the associated etcd cluster:: + + POST /nodes/{node_id}/remove-etcd + + { + rel : "design", + href: "deckhand+https://{{deckhand_url}}/revisions/{{revision_id}}/rendered-documents", + type: "application/x-yaml" + } + +Query Parameters: + +- timeout: A whole number in seconds allowed for the removal of etcd nodes + from the targe node. (Default = 1800) + +Responses +~~~~~~~~~ +All responses will be form of the UCP Status response. + +- Success: Code: 200, reason: Success + + All etcd nodes have been removed from the specified node. + +- Failure: Code: 404, reason: NotFound + + The specified node is not discoverable by Promenade + +- Failure: Code: 500, reason: RemoveEtcdError + + There was a failure to remove etcd from the target node that prevented + completion within the specified timeout, or that etcd prevented removal of + the node because it would result in the cluster being broken. The details + section should provide more information about the cause of this failure. + + +Promenade Check etcd +~~~~~~~~~~~~~~~~~~~~ +Retrieves the current interpreted state of etcd. + +GET /etcd-cluster-health-statuses?design_ref={the design ref} + +Where the design_ref parameter is required for appropriate operation, and is in +the same format as used for the join-scripts API. + +Query Parameters: + +- design_ref: (Required) the design reference to be used to discover etcd + instances. + +Responses +~~~~~~~~~ +All responses will be form of the UCP Status response. + +- Success: Code: 200, reason: Success + + The status of each etcd in the site will be returned in the details section. + Valid values for status are: Healthy, Unhealthy + +https://github.com/att-comdev/ucp-integration/blob/master/docs/source/api-conventions.rst#status-responses + +.. code:: json + + { "...": "... standard status response ...", + "details": { + "errorCount": {{n}}, + "messageList": [ + { "message": "Healthy", + "error": false, + "kind": "HealthMessage", + "name": "{{the name of the etcd service}}" + }, + { "message": "Unhealthy" + "error": false, + "kind": "HealthMessage", + "name": "{{the name of the etcd service}}" + }, + { "message": "Unable to access Etcd" + "error": true, + "kind": "HealthMessage", + "name": "{{the name of the etcd service}}" + } + ] + } + ... + } + +- Failure: Code: 400, reason: MissingDesignRef + + Returned if the design_ref parameter is not specified + +- Failure: Code: 404, reason: NotFound + + Returned if the specified etcd could not be located + +- Failure: Code: 500, reason: EtcdNotAccessible + + Returned if the specified etcd responded with an invalid health response + (Not just simply unhealthy - that's a 200). + + +Promenade Shutdown Kubelet +-------------------------- +Shuts down the kubelet on the specified node. This is accomplished by Promenade +setting the label `promenade-decomission: enabled` on the node, which will +trigger a newly-developed daemonset to run something like: +`systemctl disable kubelet && systemctl stop kubelet`. +This daemonset will effectively sit dormant until nodes have the appropriate +label added, and then perform the kubelet teardown. + +.. code:: json + + POST /nodes/{node_id}/shutdown-kubelet + +.. note:: + + This POST has no message body + +Responses +~~~~~~~~~ +All responses will be form of the UCP Status response. + +- Success: Code: 200, reason: Success + + The kubelet has been successfully shutdown + +- Failure: Code: 404, reason: NotFound + + The specified node is not discoverable by Promenade + +- Failure: Code: 500, reason: ShutdownKubeletError + + The specified node's kubelet fails to shutdown. The details section of the + status response should contain reasonable information about the source of + this failure + +Promenade Delete Node from Cluster +---------------------------------- +Updates the Kubernetes cluster, removing the specified node. Promenade should +check that the node is drained/cordoned and has no labels other than +`promenade-decomission: enabled`. In either of these cases, the API should +respond with a 409 Conflict response. + +.. code:: json + + POST /nodes/{node_id}/remove-from-cluster + +.. note:: + + This POST has no message body + +Responses +~~~~~~~~~ +All responses will be form of the UCP Status response. + +- Success: Code: 200, reason: Success + + The specified node has been removed from the Kubernetes cluster. + +- Failure: Code: 404, reason: NotFound + + The specified node is not discoverable by Promenade + +- Failure: Code: 409, reason: Conflict + + The specified node cannot be deleted due to checks that the node is + drained/cordoned and has no labels (other than possibly + `promenade-decomission: enabled`). + +- Failure: Code: 500, reason: DeleteNodeError + + The specified node cannot be removed from the cluster due to an error from + Kubernetes. The details section of the status response should contain more + information about the failure. + + +Shipyard Tag Releases +--------------------- +Shipyard will need to mark Deckhand revisions with tags when there are +successful deploy_site or update_site actions to be able to determine the last +known good design. This is related to issue 16 for Shipyard, which utilizes the +same need. + +.. note:: + + Repeated from https://github.com/att-comdev/shipyard/issues/16 + + When multiple configdocs commits have been done since the last deployment, + there is no ready means to determine what's being done to the site. Shipyard + should reject deploy site or update site requests that have had multiple + commits since the last site true-up action. An option to override this guard + should be allowed for the actions in the form of a parameter to the action. + + The configdocs API should provide a way to see what's been changed since the + last site true-up, not just the last commit of configdocs. This might be + accommodated by new deckhand tags like the 'commit' tag, but for + 'site true-up' or similar applied by the deploy and update site commands. + +The design for issue 16 includes the bare-minimum marking of Deckhand +revisions. This design is as follows: + +Scenario +~~~~~~~~ +Multiple commits occur between site actions (deploy_site, update_site) - those +actions that attempt to bring a site into compliance with a site design. +When this occurs, the current system of being able to only see what has changed +between committed and the the buffer versions (configdocs diff) is insufficient +to be able to investigate what has changed since the last successful (or +unsuccessful) site action. +To accommodate this, Shipyard needs several enhancements. + +Enhancements +~~~~~~~~~~~~ + +#. Deckhand revision tags for site actions + + Using the tagging facility provided by Deckhand, Shipyard will tag the end + of site actions. + Upon completing a site action successfully tag the revision being used with + the tag site-action-success, and a body of dag_id: + + Upon completion of a site action unsuccessfully, tag the revision being used + with the tag site-action-failure, and a body of dag_id: + + The completion tags should only be applied upon failure if the site action + gets past document validation successfully (i.e. gets to the point where it + can start making changes via the other UCP components) + + This could result in a single revision having both site-action-success and + site-action-failure if a later re-invocation of a site action is successful. + +#. Check for intermediate committed revisions + + Upon running a site action, before tagging the revision with the site action + tag(s), the dag needs to check to see if there are committed revisions that + do not have an associated site-action tag. If there are any committed + revisions since the last site action other than the current revision being + used (between them), then the action should not be allowed to proceed (stop + before triggering validations). For the calculation of intermediate + committed revisions, assume revision 0 if there are no revisions with a + site-action tag (null case) + + If the action is invoked with a parameter of + allow-intermediate-commits=true, then this check should log that the + intermediate committed revisions check is being skipped and not take any + other action. + +#. Support action parameter of allow-intermediate-commits=true|false + + In the CLI for create action, the --param option supports adding parameters + to actions. The parameters passed should be relayed by the CLI to the API + and ultimately to the invocation of the DAG. The DAG as noted above will + check for the presense of allow-intermediate-commits=true. This needs to be + tested to work. + +#. Shipyard needs to support retrieving configdocs and rendered documents for + the last successful site action, and last site action (successful or not + successful) + + --successful-site-action + --last-site-action + These options would be mutually exclusive of --buffer or --committed + +#. Shipyard diff (shipyard get configdocs) + + Needs to support an option to do the diff of the buffer vs. the last + successful site action and the last site action (succesful or not + successful). + + Currently there are no options to select which versions to diff (always + buffer vs. committed) + + support: + --base-version=committed | successful-site-action | last-site-action (Default = committed) + --diff-version=buffer | committed | successful-site-action | last-site-action (Default = buffer) + + Equivalent query parameters need to be implemented in the API. + +Because the implementation of this design will result in the tagging of +successful site-actions, Shipyard will be able to determine the correct +revision to use while attempting to teardown a node. + +If the request to teardown a node indicates a revision that doesn't exist, the +command to do so (e.g. redeploy_server) should not continue, but rather fail +due to a missing precondition. + +The invocation of the Promenade and Drydock steps in this design will utilize +the appropriate tag based on the request (default is successful-site-action) to +determine the revision of the Deckhand documents used as the design-ref. + +Shipyard redeploy_server Action +------------------------------- +The redeploy_server action currently accepts a target node. Additional +supported parameters are needed: + +#. preserve-local-storage=true which will instruct Drydock to only wipe the + OS drive, and any other local storage will not be wiped. This would allow + for the drives to be remounted to the server upon re-provisioning. The + default behavior is that local storage is not preserved. + +#. target-revision=committed | successful-site-action | last-site-action + This will indicate which revision of the design will be used as the + reference for what should be re-provisioned after the teardown. + The default is successful-site-action, which is the closest representation + to the last-known-good state. + +These should be accepted as parameters to the action API/CLI and modify the +behavior of the redeploy_server DAG. + +Security impact +--------------- + +None. This change introduces no new security concerns outside of established +patterns for RBAC controls around API endpoints. + +Performance impact +------------------ + +As this is an on-demand action, there is no expected performance impact to +existing processes, although tearing down a host may result in temporary +degraded service capacity in the case of needing to move workloads to different +hosts, or a more simple case of reduced capacity. + +Alternatives +------------ + +N/A + +Implementation +============== + +None at this time + +Dependencies +============ + +None. + + +References +========== + +None diff --git a/specs/implemented/deployment-grouping-baremetal.rst b/specs/implemented/deployment-grouping-baremetal.rst new file mode 100644 index 0000000..10cfa87 --- /dev/null +++ b/specs/implemented/deployment-grouping-baremetal.rst @@ -0,0 +1,569 @@ +.. + Copyright 2018 AT&T Intellectual Property. + All Rights Reserved. + + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + +.. index:: + single: Deployment grouping + single: workflow + single: Shipyard + single: Drydock + +.. _deployment-grouping-baremetal: + +======================================= +Deployment Grouping for Baremetal Nodes +======================================= +One of the primary functionalities of the Undercloud Platform is the deployment +of baremetal nodes as part of site deployment and upgrade. This blueprint aims +to define how deployment strategies can be applied to the workflow during these +actions. + +.. note:: + + This document has been moved from the airship-in-a-bottle project, and is + previously implemented. The format of this document diverges from the + standard template for airship-specs. + +Overview +-------- +When Shipyard is invoked for a deploy_site or update_site action, there are +three primary stages: + +1. Preparation and Validation +2. Baremetal and Network Deployment +3. Software Deployment + +During the Baremetal and Network Deployment stage, the deploy_site or +update_site workflow (and perhaps other workflows in the future) invokes +Drydock to verify the site, prepare the site, prepare the nodes, and deploy the +nodes. Each of these steps is described in the `Drydock Orchestrator Readme`_ + +.. _Drydock Orchestrator Readme: https://git.openstack.org/cgit/openstack/airship-drydock/plain/drydock_provisioner/orchestrator/readme.md + +The prepare nodes and deploy nodes steps each involve intensive and potentially +time consuming operations on the target nodes, orchestrated by Drydock and +MAAS. These steps need to be approached and managed such that grouping, +ordering, and criticality of success of nodes can be managed in support of +fault tolerant site deployments and updates. + +For the purposes of this document `phase of deployment` refer to the prepare +nodes and deploy nodes steps of the Baremetal and Network deployment. + +Some factors that advise this solution: + +1. Limits to the amount of parallelization that can occur due to a centralized + MAAS system. +2. Faults in the hardware, preventing operational nodes. +3. Miswiring or configuration of network hardware. +4. Incorrect site design causing a mismatch against the hardware. +5. Criticality of particular nodes to the realization of the site design. +6. Desired configurability within the framework of the UCP declarative site + design. +7. Improved visibility into the current state of node deployment. +8. A desire to begin the deployment of nodes before the finish of the + preparation of nodes -- i.e. start deploying nodes as soon as they are ready + to be deployed. Note: This design will not achieve new forms of + task parallelization within Drydock; this is recognized as a desired + functionality. + +Solution +-------- +Updates supporting this solution will require changes to Shipyard for changed +workflows and Drydock for the desired node targeting, and for retrieval of +diagnostic and result information. + +.. index:: + single: Shipyard Documents; DeploymentStrategy + +Deployment Strategy Document (Shipyard) +--------------------------------------- +To accommodate the needed changes, this design introduces a new +DeploymentStrategy document into the site design to be read and utilized +by the workflows for update_site and deploy_site. + +Groups +~~~~~~ +Groups are named sets of nodes that will be deployed together. The fields of a +group are: + +name + Required. The identifying name of the group. + +critical + Required. Indicates if this group is required to continue to additional + phases of deployment. + +depends_on + Required, may be empty list. Group names that must be successful before this + group can be processed. + +selectors + Required, may be empty list. A list of identifying information to indicate + the nodes that are members of this group. + +success_criteria + Optional. Criteria that must evaluate to be true before a group is considered + successfully complete with a phase of deployment. + +Criticality +''''''''''' +- Field: critical +- Valid values: true | false + +Each group is required to indicate true or false for the `critical` field. +This drives the behavior after the deployment of baremetal nodes. If any +groups that are marked as `critical: true` fail to meet that group's success +criteria, the workflow should halt after the deployment of baremetal nodes. A +group that cannot be processed due to a parent dependency failing will be +considered failed, regardless of the success criteria. + +Dependencies +'''''''''''' +- Field: depends_on +- Valid values: [] or a list of group names + +Each group specifies a list of depends_on groups, or an empty list. All +identified groups must complete successfully for the phase of deployment before +the current group is allowed to be processed by the current phase. + +- A failure (based on success criteria) of a group prevents any groups + dependent upon the failed group from being attempted. +- Circular dependencies will be rejected as invalid during document validation. +- There is no guarantee of ordering among groups that have their dependencies + met. Any group that is ready for deployment based on declared dependencies + will execute. Execution of groups is serialized - two groups will not deploy + at the same time. + +Selectors +''''''''' +- Field: selectors +- Valid values: [] or a list of selectors + +The list of selectors indicate the nodes that will be included in a group. +Each selector has four available filtering values: node_names, node_tags, +node_labels, and rack_names. Each selector is an intersection of this +critera, while the list of selectors is a union of the individual selectors. + +- Omitting a criterion from a selector, or using empty list means that criterion + is ignored. +- Having a completely empty list of selectors, or a selector that has no + criteria specified indicates ALL nodes. +- A collection of selectors that results in no nodes being identified will be + processed as if 100% of nodes successfully deployed (avoiding division by + zero), but would fail the minimum or maximum nodes criteria (still counts as + 0 nodes) +- There is no validation against the same node being in multiple groups, + however the workflow will not resubmit nodes that have already completed or + failed in this deployment to Drydock twice, since it keeps track of each node + uniquely. The success or failure of those nodes excluded from submission to + Drydock will still be used for the success criteria calculation. + +E.g.:: + + selectors: + - node_names: + - node01 + - node02 + rack_names: + - rack01 + node_tags: + - control + - node_names: + - node04 + node_labels: + - ucp_control_plane: enabled + +Will indicate (not really SQL, just for illustration):: + + SELECT nodes + WHERE node_name in ('node01', 'node02') + AND rack_name in ('rack01') + AND node_tags in ('control') + UNION + SELECT nodes + WHERE node_name in ('node04') + AND node_label in ('ucp_control_plane: enabled') + +Success Criteria +'''''''''''''''' +- Field: success_criteria +- Valid values: for possible values, see below + +Each group optionally contains success criteria which is used to indicate if +the deployment of that group is successful. The values that may be specified: + +percent_successful_nodes + The calculated success rate of nodes completing the deployment phase. + + E.g.: 75 would mean that 3 of 4 nodes must complete the phase successfully. + + This is useful for groups that have larger numbers of nodes, and do not + have critical minimums or are not sensitive to an arbitrary number of nodes + not working. + +minimum_successful_nodes + An integer indicating how many nodes must complete the phase to be considered + successful. + +maximum_failed_nodes + An integer indicating a number of nodes that are allowed to have failed the + deployment phase and still consider that group successful. + +When no criteria are specified, it means that no checks are done - processing +continues as if nothing is wrong. + +When more than one criterion is specified, each is evaluated separately - if +any fail, the group is considered failed. + + +Example Deployment Strategy Document +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows a deployment strategy with 5 groups: control-nodes, +compute-nodes-1, compute-nodes-2, monitoring-nodes, and ntp-node. + +:: + + --- + schema: shipyard/DeploymentStrategy/v1 + metadata: + schema: metadata/Document/v1 + name: deployment-strategy + layeringDefinition: + abstract: false + layer: global + storagePolicy: cleartext + data: + groups: + - name: control-nodes + critical: true + depends_on: + - ntp-node + selectors: + - node_names: [] + node_labels: [] + node_tags: + - control + rack_names: + - rack03 + success_criteria: + percent_successful_nodes: 90 + minimum_successful_nodes: 3 + maximum_failed_nodes: 1 + - name: compute-nodes-1 + critical: false + depends_on: + - control-nodes + selectors: + - node_names: [] + node_labels: [] + rack_names: + - rack01 + node_tags: + - compute + success_criteria: + percent_successful_nodes: 50 + - name: compute-nodes-2 + critical: false + depends_on: + - control-nodes + selectors: + - node_names: [] + node_labels: [] + rack_names: + - rack02 + node_tags: + - compute + success_criteria: + percent_successful_nodes: 50 + - name: monitoring-nodes + critical: false + depends_on: [] + selectors: + - node_names: [] + node_labels: [] + node_tags: + - monitoring + rack_names: + - rack03 + - rack02 + - rack01 + - name: ntp-node + critical: true + depends_on: [] + selectors: + - node_names: + - ntp01 + node_labels: [] + node_tags: [] + rack_names: [] + success_criteria: + minimum_successful_nodes: 1 + +The ordering of groups, as defined by the dependencies (``depends-on`` +fields):: + + __________ __________________ + | ntp-node | | monitoring-nodes | + ---------- ------------------ + | + ____V__________ + | control-nodes | + --------------- + |_________________________ + | | + ______V__________ ______V__________ + | compute-nodes-1 | | compute-nodes-2 | + ----------------- ----------------- + +Given this, the order of execution could be: + +- ntp-node > monitoring-nodes > control-nodes > compute-nodes-1 > compute-nodes-2 +- ntp-node > control-nodes > compute-nodes-2 > compute-nodes-1 > monitoring-nodes +- monitoring-nodes > ntp-node > control-nodes > compute-nodes-1 > compute-nodes-2 +- and many more ... the only guarantee is that ntp-node will run some time + before control-nodes, which will run sometime before both of the + compute-nodes. Monitoring-nodes can run at any time. + +Also of note are the various combinations of selectors and the varied use of +success criteria. + +Deployment Configuration Document (Shipyard) +-------------------------------------------- +The existing deployment-configuration document that is used by the workflows +will also be modified to use the existing deployment_strategy field to provide +the name of the deployment-straegy document that will be used. + +The default value for the name of the DeploymentStrategy document will be +``deployment-strategy``. + +Drydock Changes +--------------- + +API and CLI +~~~~~~~~~~~ +- A new API needs to be provided that accepts a node filter (i.e. selector, + above) and returns a list of node names that result from analysis of the + design. Input to this API will also need to include a design reference. + +- Drydock needs to provide a "tree" output of tasks rooted at the requested + parent task. This will provide the needed success/failure status for nodes + that have been prepared/deployed. + +Documentation +~~~~~~~~~~~~~ +Drydock documentation will be updated to match the introduction of new APIs + + +Shipyard Changes +---------------- + +API and CLI +~~~~~~~~~~~ +- The commit configdocs api will need to be enhanced to look up the + DeploymentStrategy by using the DeploymentConfiguration. +- The DeploymentStrategy document will need to be validated to ensure there are + no circular dependencies in the groups' declared dependencies (perhaps + NetworkX_). +- A new API endpoint (and matching CLI) is desired to retrieve the status of + nodes as known to Drydock/MAAS and their MAAS status. The existing node list + API in Drydock provides a JSON output that can be utilized for this purpose. + +Workflow +~~~~~~~~ +The deploy_site and update_site workflows will be modified to utilize the +DeploymentStrategy. + +- The deployment configuration step will be enhanced to also read the + deployment strategy and pass the information on a new xcom for use by the + baremetal nodes step (see below) +- The prepare nodes and deploy nodes steps will be combined to perform both as + part of the resolution of an overall ``baremetal nodes`` step. + The baremetal nodes step will introduce functionality that reads in the + deployment strategy (from the prior xcom), and can orchestrate the calls to + Drydock to enact the grouping, ordering and and success evaluation. + Note that Drydock will serialize tasks; there is no parallelization of + prepare/deploy at this time. + +Needed Functionality +'''''''''''''''''''' + +- function to formulate the ordered groups based on dependencies (perhaps + NetworkX_) +- function to evaluate success/failure against the success criteria for a group + based on the result list of succeeded or failed nodes. +- function to mark groups as success or failure (including failed due to + dependency failure), as well as keep track of the (if any) successful and + failed nodes. +- function to get a group that is ready to execute, or 'Done' when all groups + are either complete or failed. +- function to formulate the node filter for Drydock based on a group's + selectors +- function to orchestrate processing groups, moving to the next group (or being + done) when a prior group completes or fails. +- function to summarize the success/failed nodes for a group (primarily for + reporting to the logs at this time). + +Process +''''''' +The baremetal nodes step (preparation and deployment of nodes) will proceed as +follows: + +1. Each group's selector will be sent to Drydock to determine the list of + nodes that are a part of that group. + + - An overall status will be kept for each unique node (not started | + prepared | success | failure). + - When sending a task to Drydock for processing, the nodes associated with + that group will be sent as a simple `node_name` node filter. This will + allow for this list to exclude nodes that have a status that is not + congruent for the task being performed. + + - prepare nodes valid status: not started + - deploy nodes valid status: prepared + +2. In a processing loop, groups that are ready to be processed based on their + dependencies (and the success criteria of groups they are dependent upon) + will be selected for processing until there are no more groups that can be + processed. The processing will consist of preparing and then deploying the + group. + + - The selected group will be prepared and then deployed before selecting + another group for processing. + - Any nodes that failed as part of that group will be excluded from + subsequent deployment or preparation of that node for this deployment. + + - Excluding nodes that are already processed addresses groups that have + overlapping lists of nodes due to the group's selectors, and prevents + sending them to Drydock for re-processing. + - Evaluation of the success criteria will use the full set of nodes + identified by the selector. This means that if a node was previously + successfully deployed, that same node will count as "successful" when + evaluating the success criteria. + + - The success criteria will be evaluated after the group's prepare step and + the deploy step. A failure to meet the success criteria in a prepare step + will cause the deploy step for that group to be skipped (and marked as + failed). + - Any nodes that fail during the prepare step, will not be used in the + corresponding deploy step. + - Upon completion (success, partial success, or failure) of a prepare step, + the nodes that were sent for preparation will be marked in the unique list + of nodes (above) with their appropriate status: prepared or failure + - Upon completion of a group's deployment step, the nodes status will be + updated to their current status: success or failure. + +4. Before the end of the baremetal nodes step, following all eligible group + processing, a report will be logged to indicate the success/failure of + groups and the status of the individual nodes. Note that it is possible for + individual nodes to be left in `not started` state if they were only part of + groups that were never allowed to process due to dependencies and success + criteria. + +5. At the end of the baremetal nodes step, if any nodes that have failed + due to timeout, dependency failure, or success criteria failure and are + marked as critical will trigger an Airflow Exception, resulting in a failed + deployment. + +Notes: + +- The timeout values specified for the prepare nodes and deploy nodes steps + will be used to put bounds on the individual calls to Drydock. A failure + based on these values will be treated as a failure for the group; we need to + be vigilant on if this will lead to indeterminate states for nodes that mess + with further processing. (e.g. Timed out, but the requested work still + continued to completion) + +Example Processing +'''''''''''''''''' +Using the defined deployment strategy in the above example, the following is +an example of how it may process:: + + Start + | + | prepare ntp-node + | deploy ntp-node + V + | prepare control-nodes + | deploy control-nodes + V + | prepare monitoring-nodes + | deploy monitoring-nodes + V + | prepare compute-nodes-2 + | deploy compute-nodes-2 + V + | prepare compute-nodes-1 + | deploy compute-nodes-1 + | + Finish (success) + +If there were a failure in preparing the ntp-node, the following would be the +result:: + + Start + | + | prepare ntp-node + | deploy ntp-node + V + | prepare control-nodes + | deploy control-nodes + V + | prepare monitoring-nodes + | deploy monitoring-nodes + V + | prepare compute-nodes-2 + | deploy compute-nodes-2 + V + | prepare compute-nodes-1 + | deploy compute-nodes-1 + | + Finish (failed due to critical group failed) + +If a failure occurred during the deploy of compute-nodes-2, the following would +result:: + + Start + | + | prepare ntp-node + | deploy ntp-node + V + | prepare control-nodes + | deploy control-nodes + V + | prepare monitoring-nodes + | deploy monitoring-nodes + V + | prepare compute-nodes-2 + | deploy compute-nodes-2 + V + | prepare compute-nodes-1 + | deploy compute-nodes-1 + | + Finish (success with some nodes/groups failed) + +Schemas +~~~~~~~ +A new schema will need to be provided by Shipyard to validate the +DeploymentStrategy document. + +Documentation +~~~~~~~~~~~~~ +The Shipyard action documentation will need to include details defining the +DeploymentStrategy document (mostly as defined here), as well as the update to +the DeploymentConfiguration document to contain the name of the +DeploymentStrategy document. + + +.. _NetworkX: https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.algorithms.dag.topological_sort.html