specs/specs/1.x/approved/workflow_node-teardown.rst

..
      Copyright 2018 AT&T Intellectual Property.
      All Rights Reserved.

      Licensed under the Apache License, Version 2.0 (the "License"); you may
      not use this file except in compliance with the License. You may obtain
      a copy of the License at

          http://www.apache.org/licenses/LICENSE-2.0

      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
      License for the specific language governing permissions and limitations
      under the License.

.. index::
   single: Teardown node
   single: workflow;redeploy_server
   single: Drydock
   single: Promenade
   single: Shipyard


.. _node-teardown:

=====================
Airship Node Teardown
=====================

Shipyard is the entrypoint for Airship actions, including the need to redeploy a
server. The first part of redeploying a server is the graceful teardown of the
software running on the server; specifically Kubernetes and etcd are of
critical concern. It is the duty of Shipyard to orchestrate the teardown of the
server, followed by steps to deploy the desired new configuration. This design
covers only the first portion - node teardown


Links
=====

None

Problem description
===================

When redeploying a physical host (server) using the Airship Platform,
it is necessary to trigger a sequence of steps to prevent undesired behaviors
when the server is redeployed. This blueprint intends to document the
interaction that must occur between Airship components to teardown a server.

Impacted components
===================

Drydock
Promenade
Shipyard

Proposed change
===============

Shipyard node teardown Process
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#. (Existing) Shipyard receives request to redeploy_server, specifying a target
   server.
#. (Existing) Shipyard performs preflight, design reference lookup, and
   validation steps.
#. (New) Shipyard invokes Promenade to decommission a node.
#. (New) Shipyard invokes Drydock to destroy the node - setting a node
   filter to restrict to a single server.
#. (New) Shipyard invokes Promenade to remove the node from the Kubernetes
   cluster.

Assumption:
node_id is the hostname of the server, and is also the identifier that both
Drydock and Promenade use to identify the appropriate parts - hosts and k8s
nodes. This convention is set by the join script produced by promenade.

Drydock Destroy Node
--------------------
The API/interface for destroy node already exists. The implementation within
Drydock needs to be developed. This interface will need to accept both the
specified node_id and the design_id to retrieve from Deckhand.

Using the provided node_id (hardware node), and the design_id, Drydock will
reset the hardware to a re-provisionable state.

By default, all local storage should be wiped (per datacenter policy for
wiping before re-use).

An option to allow for only the OS disk to be wiped should be supported, such
that other local storage is left intact, and could be remounted without data
loss. e.g.: --preserve-local-storage

The target node should be shut down.

The target node should be removed from the provisioner (e.g. MaaS)

Responses
~~~~~~~~~
The responses from this functionality should follow the pattern set by prepare
nodes, and other Drydock functionality. The Drydock status responses used for
all async invocations will be utilized for this functionality.

Promenade Decommission Node
---------------------------
Performs steps that will result in the specified node being cleanly
disassociated from Kubernetes, and ready for the server to be destroyed.
Users of the decommission node API should be aware of the long timeout values
that may occur while awaiting promenade to complete the appropriate steps.
At this time, Promenade is a stateless service and doesn't use any database
storage. As such, requests to Promenade are synchronous.

.. code:: json

  POST /nodes/{node_id}/decommission

  {
    rel : "design",
    href: "deckhand+https://{{deckhand_url}}/revisions/{{revision_id}}/rendered-documents",
    type: "application/x-yaml"
  }

Such that the design reference body is the design indicated when the
redeploy_server action is invoked through Shipyard.

Query Parameters:

-  drain-node-timeout: A whole number timeout in seconds to be used for the
   drain node step (default: none). In the case of no value being provided,
   the drain node step will use its default.
-  drain-node-grace-period: A whole number in seconds indicating the
   grace-period that will be provided to the drain node step. (default: none).
   If no value is specified, the drain node step will use its default.
-  clear-labels-timeout: A whole number timeout in seconds to be used for the
   clear labels step. (default: none).  If no value is specified, clear labels
   will use its own default.
-  remove-etcd-timeout: A whole number timeout in seconds to be used for the
   remove etcd from nodes step. (default: none). If no value is specified,
   remove-etcd will use its own default.
-  etcd-ready-timeout: A whole number in seconds indicating how long the
   decommission node request should allow for etcd clusters to become stable
   (default: 600).

Process
~~~~~~~
Acting upon the node specified by the invocation and the design reference
details:

#. Drain the Kubernetes node.
#. Clear the Kubernetes labels on the node.
#. Remove etcd nodes from their clusters (if impacted).

   - if the node being decommissioned contains etcd nodes, Promenade will
     attempt to gracefully have those nodes leave the etcd cluster.

#. Ensure that etcd cluster(s) are in a stable state.

   - Polls for status every 30 seconds up to the etcd-ready-timeout, or the
     cluster meets the defined minimum functionality for the site.
   - A new document: promenade/EtcdClusters/v1 that will specify details about
     the etcd clusters deployed in the site, including: identifiers,
     credentials, and thresholds for minimum functionality.
   - This process should ignore the node being torn down from any calculation
     of health

#. Shutdown the kubelet.

   - If this is not possible because the node is in a state of disarray such
     that it cannot schedule the daemonset to run, this step may fail, but
     should not hold up the process, as the Drydock dismantling of the node
     will shut the kubelet down.

Responses
~~~~~~~~~
All responses will be form of the Airship Status response.

-  Success: Code: 200, reason: Success

   Indicates that all steps are successful.
-  Failure: Code: 404, reason: NotFound

   Indicates that the target node is not discoverable by Promenade.
-  Failure: Code: 500, reason: DisassociateStepFailure

   The details section should detail the successes and failures further. Any
   4xx series errors from the individual steps would manifest as a 500 here.

Promenade Drain Node
--------------------
Drain the Kubernetes node for the target node. This will ensure that this node
is no longer the target of any pod scheduling, and evicts or deletes the
running pods. In the case of notes running DaemonSet manged pods, or pods
that would prevent a drain from occurring, Promenade may be required to provide
the `ignore-daemonsets` option or `force` option to attempt to drain the node
as fully as possible.

By default, the drain node will utilize a grace period for pods of 1800
seconds and a total timeout of 3600 seconds (1 hour). Clients of this
functionality should be prepared for a long timeout.

.. code:: json

  POST /nodes/{node_id}/drain

Query Paramters:

-  timeout: a whole number in seconds (default = 3600). This value is the total
   timeout for the kubectl drain command.
-  grace-period: a whole number in seconds (default = 1800). This value is the
   grace period used by kubectl drain. Grace period must be less than timeout.

.. note::

   This POST has no message body

Example command being used for drain (reference only)
`kubectl drain --force --timeout 3600s --grace-period 1800 --ignore-daemonsets --delete-local-data n1`
https://git.openstack.org/cgit/openstack/airship-promenade/tree/promenade/templates/roles/common/usr/local/bin/promenade-teardown

Responses
~~~~~~~~~
All responses will be form of the Airship Status response.

-  Success: Code: 200, reason: Success

   Indicates that the drain node has successfully concluded, and that no pods
   are currently running
-  Failure: Status response, code: 400, reason: BadRequest

   A request was made with parameters that cannot work - e.g. grace-period is
   set to a value larger than the timeout value.
-  Failure: Status response, code: 404, reason: NotFound

   The specified node is not discoverable by Promenade
-  Failure: Status response, code: 500, reason: DrainNodeError

   There was a processing exception raised while trying to drain a node. The
   details section should indicate the underlying cause if it can be
   determined.

Promenade Clear Labels
----------------------
Removes the labels that have been added to the target kubernetes node.

.. code:: json

  POST /nodes/{node_id}/clear-labels

Query Parameters:

-  timeout: A whole number in seconds allowed for the pods to settle/move
   following removal of labels. (Default = 1800)

.. note::

   This POST has no message body

Responses
~~~~~~~~~
All responses will be form of the Airship Status response.

-  Success: Code: 200, reason: Success

   All labels have been removed from the specified Kubernetes node.
-  Failure: Code: 404, reason: NotFound

   The specified node is not discoverable by Promenade
-  Failure: Code: 500, reason: ClearLabelsError

   There was a failure to clear labels that prevented completion. The details
   section should provide more information about the cause of this failure.

Promenade Remove etcd Node
~~~~~~~~~~~~~~~~~~~~~~~~~~
Checks if the node specified contains any etcd nodes. If so, this API will
trigger that etcd node to leave the associated etcd cluster::

  POST /nodes/{node_id}/remove-etcd

  {
    rel : "design",
    href: "deckhand+https://{{deckhand_url}}/revisions/{{revision_id}}/rendered-documents",
    type: "application/x-yaml"
  }

Query Parameters:

-  timeout: A whole number in seconds allowed for the removal of etcd nodes
   from the targe node. (Default = 1800)

Responses
~~~~~~~~~
All responses will be form of the Airship Status response.

-  Success: Code: 200, reason: Success

   All etcd nodes have been removed from the specified node.
-  Failure: Code: 404, reason: NotFound

   The specified node is not discoverable by Promenade
-  Failure: Code: 500, reason: RemoveEtcdError

   There was a failure to remove etcd from the target node that prevented
   completion within the specified timeout, or that etcd prevented removal of
   the node because it would result in the cluster being broken. The details
   section should provide more information about the cause of this failure.


Promenade Check etcd
~~~~~~~~~~~~~~~~~~~~
Retrieves the current interpreted state of etcd.

  GET /etcd-cluster-health-statuses?design_ref={the design ref}

Where the design_ref parameter is required for appropriate operation, and is in
the same format as used for the join-scripts API.

Query Parameters:

-  design_ref: (Required) the design reference to be used to discover etcd
   instances.

Responses
~~~~~~~~~
All responses will be form of the Airship Status response.

-  Success: Code: 200, reason: Success

   The status of each etcd in the site will be returned in the details section.
   Valid values for status are: Healthy, Unhealthy

   https://github.com/openstack/airship-in-a-bottle/blob/master/doc/source/api-conventions.rst#status-responses

   .. code:: json

     { "...": "... standard status response ...",
       "details": {
         "errorCount": {{n}},
         "messageList": [
           { "message": "Healthy",
             "error": false,
             "kind": "HealthMessage",
             "name": "{{the name of the etcd service}}"
           },
           { "message": "Unhealthy"
             "error": false,
             "kind": "HealthMessage",
             "name": "{{the name of the etcd service}}"
           },
           { "message": "Unable to access Etcd"
             "error": true,
             "kind": "HealthMessage",
             "name": "{{the name of the etcd service}}"
           }
         ]
       }
       ...
     }

-  Failure: Code: 400, reason: MissingDesignRef

   Returned if the design_ref parameter is not specified
-  Failure: Code: 404, reason: NotFound

   Returned if the specified etcd could not be located
-  Failure: Code: 500, reason: EtcdNotAccessible

   Returned if the specified etcd responded with an invalid health response
   (Not just simply unhealthy - that's a 200).


Promenade Shutdown Kubelet
--------------------------
Shuts down the kubelet on the specified node. This is accomplished by Promenade
setting the label `promenade-decomission: enabled` on the node, which will
trigger a newly-developed daemonset to run something like:
`systemctl disable kubelet && systemctl stop kubelet`.
This daemonset will effectively sit dormant until nodes have the appropriate
label added, and then perform the kubelet teardown.

.. code:: json

  POST /nodes/{node_id}/shutdown-kubelet

.. note::

   This POST has no message body

Responses
~~~~~~~~~
All responses will be form of the Airship Status response.

-  Success: Code: 200, reason: Success

   The kubelet has been successfully shutdown
-  Failure: Code: 404, reason: NotFound

   The specified node is not discoverable by Promenade
-  Failure: Code: 500, reason: ShutdownKubeletError

   The specified node's kubelet fails to shutdown. The details section of the
   status response should contain reasonable information about the source of
   this failure

Promenade Delete Node from Cluster
----------------------------------
Updates the Kubernetes cluster, removing the specified node. Promenade should
check that the node is drained/cordoned and has no labels other than
`promenade-decomission: enabled`. In either of these cases, the API should
respond with a 409 Conflict response.

.. code:: json

  POST /nodes/{node_id}/remove-from-cluster

.. note::

   This POST has no message body

Responses
~~~~~~~~~
All responses will be form of the Airship Status response.

-  Success: Code: 200, reason: Success

   The specified node has been removed from the Kubernetes cluster.
-  Failure: Code: 404, reason: NotFound

   The specified node is not discoverable by Promenade
-  Failure: Code: 409, reason: Conflict

   The specified node cannot be deleted due to checks that the node is
   drained/cordoned and has no labels (other than possibly
   `promenade-decomission: enabled`).
-  Failure: Code: 500, reason: DeleteNodeError

   The specified node cannot be removed from the cluster due to an error from
   Kubernetes. The details section of the status response should contain more
   information about the failure.


Shipyard Tag Releases
---------------------
Shipyard will need to mark Deckhand revisions with tags when there are
successful deploy_site or update_site actions to be able to determine the last
known good design. This is related to issue 16 for Shipyard, which utilizes the
same need.

.. note::

   Repeated from https://github.com/att-comdev/shipyard/issues/16

   When multiple configdocs commits have been done since the last deployment,
   there is no ready means to determine what's being done to the site. Shipyard
   should reject deploy site or update site requests that have had multiple
   commits since the last site true-up action. An option to override this guard
   should be allowed for the actions in the form of a parameter to the action.

   The configdocs API should provide a way to see what's been changed since the
   last site true-up, not just the last commit of configdocs. This might be
   accommodated by new deckhand tags like the 'commit' tag, but for
   'site true-up' or similar applied by the deploy and update site commands.

The design for issue 16 includes the bare-minimum marking of Deckhand
revisions. This design is as follows:

Scenario
~~~~~~~~
Multiple commits occur between site actions (deploy_site, update_site) - those
actions that attempt to bring a site into compliance with a site design.
When this occurs, the current system of being able to only see what has changed
between committed and the buffer versions (configdocs diff) is insufficient
to be able to investigate what has changed since the last successful (or
unsuccessful) site action.
To accommodate this, Shipyard needs several enhancements.

Enhancements
~~~~~~~~~~~~

#. Deckhand revision tags for site actions

   Using the tagging facility provided by Deckhand, Shipyard will tag the end
   of site actions.
   Upon completing a site action successfully tag the revision being used with
   the tag site-action-success, and a body of dag_id:<dag_id>

   Upon completion of a site action unsuccessfully, tag the revision being used
   with the tag site-action-failure, and a body of dag_id:<dag_id>

   The completion tags should only be applied upon failure if the site action
   gets past document validation successfully (i.e. gets to the point where it
   can start making changes via the other Airship components)

   This could result in a single revision having both site-action-success and
   site-action-failure if a later re-invocation of a site action is successful.

#. Check for intermediate committed revisions

   Upon running a site action, before tagging the revision with the site action
   tag(s), the dag needs to check to see if there are committed revisions that
   do not have an associated site-action tag.  If there are any committed
   revisions since the last site action other than the current revision being
   used (between them), then the action should not be allowed to proceed (stop
   before triggering validations). For the calculation of intermediate
   committed revisions, assume revision 0 if there are no revisions with a
   site-action tag (null case)

   If the action is invoked with a parameter of
   allow-intermediate-commits=true, then this check should log that the
   intermediate committed revisions check is being skipped and not take any
   other action.

#. Support action parameter of allow-intermediate-commits=true|false

   In the CLI for create action, the --param option supports adding parameters
   to actions. The parameters passed should be relayed by the CLI to the API
   and ultimately to the invocation of the DAG.  The DAG as noted above will
   check for the presense of allow-intermediate-commits=true.  This needs to be
   tested to work.

#. Shipyard needs to support retrieving configdocs and rendered documents for
   the last successful site action, and last site action (successful or not
   successful)

   --successful-site-action
   --last-site-action
   These options would be mutually exclusive of --buffer or --committed

#. Shipyard diff (shipyard get configdocs)

   Needs to support an option to do the diff of the buffer vs. the last
   successful site action and the last site action (succesful or not
   successful).

   Currently there are no options to select which versions to diff (always
   buffer vs. committed)

   support:
   --base-version=committed | successful-site-action | last-site-action (Default = committed)
   --diff-version=buffer | committed | successful-site-action | last-site-action (Default = buffer)

   Equivalent query parameters need to be implemented in the API.

Because the implementation of this design will result in the tagging of
successful site-actions, Shipyard will be able to determine the correct
revision to use while attempting to teardown a node.

If the request to teardown a node indicates a revision that doesn't exist, the
command to do so (e.g. redeploy_server) should not continue, but rather fail
due to a missing precondition.

The invocation of the Promenade and Drydock steps in this design will utilize
the appropriate tag based on the request (default is successful-site-action) to
determine the revision of the Deckhand documents used as the design-ref.

Shipyard redeploy_server Action
-------------------------------
The redeploy_server action currently accepts a target node. Additional
supported parameters are needed:

#. preserve-local-storage=true which will instruct Drydock to only wipe the
   OS drive, and any other local storage will not be wiped. This would allow
   for the drives to be remounted to the server upon re-provisioning. The
   default behavior is that local storage is not preserved.

#. target-revision=committed | successful-site-action | last-site-action
   This will indicate which revision of the design will be used as the
   reference for what should be re-provisioned after the teardown.
   The default is successful-site-action, which is the closest representation
   to the last-known-good state.

These should be accepted as parameters to the action API/CLI and modify the
behavior of the redeploy_server DAG.

Security impact
---------------

None. This change introduces no new security concerns outside of established
patterns for RBAC controls around API endpoints.

Performance impact
------------------

As this is an on-demand action, there is no expected performance impact to
existing processes, although tearing down a host may result in temporary
degraded service capacity in the case of needing to move workloads to different
hosts, or a more simple case of reduced capacity.

Alternatives
------------

N/A

Implementation
==============

None at this time

Dependencies
============

None.


References
==========

None