shipyard/doc/source/action-commands.rst
Drew Walters 84921b31d2 actions: Add Shipyard action to test site
This commit introduces an action, `test_site`, that invokes Helm
tests for all deployed releases using the
`ArmadaTestReleasesOperator` introduced in [1]. This action supports
the ability to invoke Helm tests for a specific release using the
`release` parameter and cleanup resources if the `cleanup` parameter
is set to `true`.

[1] https://review.openstack.org/#/c/603236/

Depends-On: https://review.openstack.org/#/c/603236/
Change-Id: Ib5f38fe4b8a6516ee2afae62774ec84f1d2eb1ad
2018-09-27 16:38:54 -05:00

9.9 KiB

Action Commands

Example invocation

API input to create an action follows this pattern, varying the name field:

Without Parmeters:

POST /v1.0/actions

{"name" : "update_site"}

With Parameters:

POST /v1.0/actions

{
  "name": "redeploy_server",
  "parameters": {
    "target_nodes": ["node1", "node2"]
  }
}

POST /v1.0/actions

{
  "name": "update_site",
  "parameters": {
    "continue-on-fail": "true"
  }
}

Analogous CLI commands:

shipyard create action update_site
shipyard create action redeploy_server --param="target_nodes=node1,node2"
shipyard create action update_site --param="continue-on-fail=true"

Supported actions

These actions are currently supported using the Action API and CLI

deploy_site

Triggers the initial deployment of a site, using the latest committed configuration documents. Steps, conceptually:

  1. Concurrency check

    Prevents concurrent site modifications by conflicting actions/workflows.

  2. Preflight checks

    Ensures all Airship components are in a responsive state.

  3. Validate design

    Asks each involved Airship component to validate the design. This ensures that the previously committed design is valid at the present time.

  4. Drydock build

    Orchestrates the Drydock component to configure hardware and the Kubernetes environment (Drydock -> Promenade)

  5. Armada build

    Orchestrates Armada to configure software on the nodes as designed.

update_site

Applies a new committed configuration to the environment. The steps of update_site mirror those of deploy_site.

update_software

Triggers an update of the software in a site, using the latest committed configuration documents. Steps, conceptually:

  1. Concurrency check

    Prevents concurrent site modifications by conflicting actions/workflows.

  2. Validate design

    Asks each involved Airship component to validate the design. This ensures that the previously committed design is valid at the present time.

  3. Armada build

    Orchestrates Armada to configure software on the nodes as designed.

redeploy_server

Using parameters to indicate which server(s) triggers a teardown and subsequent deployment of those servers to restore them to the current committed design.

This action is a target action, and does not apply the site action labels to the revision of documents in Deckhand. Application of site action labels is reserved for site actions such as deploy_site and update_site.

Like other target actions that will use a baremetal or Kubernetes node as a target, the target_nodes parameter will be used to list the names of the nodes that will be acted upon.

Using redeploy_server

Danger

At this time, there are no safeguards with regard to the running workload in place before tearing down a server and the result may be very disruptive to a working site. Users are cautioned to ensure the server being torn down is not running a critical workload. To support controlling this, the Shipyard service allows actions to be associated with RBAC rules. A deployment of Shipyard can restrict access to this action to help prevent unexpected disaster.

Redeploying a server can have consequences to the running workload as noted above. There are actions that can be taken by a deployment engineer or system administrator before performing a redeploy_server to mitigate the risks and impact.

There are three broad categories of nodes that can be considered in regard to redeploy_server. It is possible that a node is both a Worker and a Control node depending on the deployment of Airship:

  1. Broken Node:

    A non-functional node, e.g. a host that has been corrupted to the point of being unable to participate in the Kubernetes cluster.

  2. Worker Node:

    A node that is participating in the Kubernetes cluster not running control plane software, but providing capacity for workloads running in the environment.

  3. Control Node:

    A node that is participating in the Kubernetes cluster and is hosting control plane software. E.g. Airship or other components that serve as controllers for the rest of the cluster in some way. These nodes may run software such as etcd or databases that contribute to the health of the overall Kubernetes cluster.

    Note that there is also the Genesis host, used to bootstrap the Airship platform. This node currently runs the Airship containers, including some that are not yet able to be migrated to other nodes, e.g. the MAAS rack controller, and disruptions arising from moving PostgreSQL.

Important

Use of redeploy_server on the Airship Genesis host/node is not supported, and will result in serious disruption.

Yes

Recommended step for this node type

No

Generally not necessary for this node type

N/A

Not applicable for this node type

Action Broken Worker Control
Coordinate workload impacts with users 1 Yes Yes No
Clear Kubernetes labels from the node (for each label) N/A Yes Yes
`$ kubectl label nodes <node> <label>- `
Etcd - check for cluster health N/A N/A Yes
$ kubectl -n kube-system exec kubernet member list es-etcd-< hostname> etcdctl
Drain Kubernetes node N/A Yes Yes
$ kubectl drain <node>
Disable the kubelet service N/A Yes Yes

$ systemctl stop kubelet

$ systemctl disable kubelet

Remove node from Kubernetes Yes Yes Yes
$ kubectl delete node <node>
Backup Disks (processes vary)2 Yes Yes Yes

relabel_nodes

Using parameters to indicate which server(s), triggers an update to the Kubernetes node labels for those servers.

test_site

Triggers the execution of the Helm tests corresponding to all deployed releases in all namespaces. Steps, conceptually:

  1. Preflight checks

    Ensures all Airship components are in a responsive state.

  2. Armada test

    Invokes Armada to execute Helm tests for all releases.

Using test_site

The test_site action accepts two optional parameters:

  1. cleanup: A boolean value that instructs Armada to delete test pods after test execution. Default value is false. Failure to set this value to True may require manual intervention to re-execute tests, as test pods will not be deleted.
  2. release: The name of a release to test. When provided, tests are only executed for the specified release.

An example of invoking Helm tests with cleanup enabled:

shipyard create action test_site --param="cleanup=true"

An example of invoking Helm tests for a single release:

shipyard create action test_site --param="release=keystone"

update labels

Triggers an update to the Kubernetes node labels for specified server(s)


  1. Of course it is up to the infrastructure operator if they wish to coordinate with their users. This guide assumes client or user communication as a common courtesy.↩︎

  2. Server redeployment will (quick) erase all disks during the process, but desired enhancements to redeploy_server may include options for disk handling. Situationally, it may not be necessary to backup disks if the underlying implementation already provides the needed resiliency and redundancy.↩︎