From f26f502fbf26024598db792720f5c80a5c51c355 Mon Sep 17 00:00:00 2001 From: Clark Boylan Date: Wed, 5 Apr 2017 10:17:21 -0700 Subject: [PATCH] Start adding operational docs to zuulv3 This tries to capture common operation tasks in the documentation. It also clears up some related items about what is necessary to have a functioning Nodepool installation and what the dib-image-delete command does. Story: 2000790 Change-Id: I397fc4879fa84ffc667ddda0aff9c107eee0d694 --- doc/source/operation.rst | 63 +++++++++++++++++++++++++++++++++++-- nodepool/cmd/nodepoolcmd.py | 3 +- 2 files changed, 63 insertions(+), 3 deletions(-) diff --git a/doc/source/operation.rst b/doc/source/operation.rst index 25787c878..50efb73e2 100644 --- a/doc/source/operation.rst +++ b/doc/source/operation.rst @@ -12,6 +12,10 @@ Both daemons frequently re-read their configuration file after starting to support adding or removing new images and providers, or otherwise altering the configuration. +These daemons communicate with each other via a Zookeeper database. +You must run Zookeeper and at least one of each of these daemons to +have a functioning Nodepool installation. + Nodepool-builder ---------------- @@ -35,8 +39,8 @@ Nodepool-launcher ----------------- The main nodepool daemon is named ``nodepool-launcher`` and is -responsible for launching instances from the images created and -uploaded by ``nodepool-builder``. +responsible for managing cloud instances launched from the images +created and uploaded by ``nodepool-builder``. When a new image is created and uploaded, ``nodepool-launcher`` will immediately start using it when launching nodes (Nodepool always uses @@ -90,6 +94,61 @@ metadata: nodepool_node_id The nodepool id of the node as an integer. +Common Management Tasks +----------------------- + +In the course of running a Nodepool service you will find that there are +some common operations that will be performed. Like the services +themselves these are split into two groups, image management and +instance management. + +Image Management +~~~~~~~~~~~~~~~~ + +Before Nodepool can launch any cloud instances it must have images to boot +off of. ``nodepool dib-image-list`` will show you which images are available +locally on disk. These images on disk are then uploaded to clouds, +``nodepool image-list`` will show you what images are bootable in your +various clouds. + +If you need to force a new image to be built to pick up a new feature more +quickly than the normal rebuild cycle (which defaults to 24 hours) you can +manually trigger a rebuild. Using ``nodepool image-build`` you can tell +Nodepool to begin a new image build now. Note that depending on work that +the nodepool-builder is already performing this may queue the build. Check +``nodepool dib-image-list`` to see the current state of the builds. Once +the image is built it is automatically uploaded to all of the clouds +configured to use that image. + +At times you may need to stop using an existing image because it is broken. +Your two major options here are to build a new image to replace the existing +image or to delete the existing image and have Nodepool fall back on using +the previous image. Rebuilding and uploading can be slow so typically the +best option is to simply ``nodepool image-delete`` the most recent image +which will cause Nodepool to fallback on using the previous image. Howevever, +if you do this without "pausing" the image it will be immediately reuploaded. +You will want to pause the image if you need to further investigate why +the image is not being built correctly. If you know the image will be built +correctly you can simple delete the built image and remove it from all clouds +which will cause it to be rebuilt using ``nodepool dib-image-delete``. + +Instance Management +~~~~~~~~~~~~~~~~~~~ + +With working images in providers you should see Nodepool launching instances +in these providers using the images it built. You may find that you need to +debug a particular job failure manually. An easy way to do this is to +``nodepool hold`` an instance then log in to the instance and perform any +necessary debugging steps. Note that this doesn't stop the job running there, +what it will do is prevent Nodepool from automatically deleting this instance +once the job is complete. + +In some circumstances like manually holding an instance above, or wanting to +force a job restart you may want to delete a running instance. You can issue +a ``nodepool delete`` to force nodepool to do this. + +Complete command help info is below. + Command Line Tools ------------------ diff --git a/nodepool/cmd/nodepoolcmd.py b/nodepool/cmd/nodepoolcmd.py index 09d6cf63c..de6bb5ee4 100644 --- a/nodepool/cmd/nodepoolcmd.py +++ b/nodepool/cmd/nodepoolcmd.py @@ -111,7 +111,8 @@ class NodePoolCmd(NodepoolApp): cmd_dib_image_delete = subparsers.add_parser( 'dib-image-delete', - help='delete image built with diskimage-builder') + help='Delete a dib built image from disk along with all cloud ' + 'uploads of this image') cmd_dib_image_delete.set_defaults(func=self.dib_image_delete) cmd_dib_image_delete.add_argument('id', help='dib image id')