Files

asettle 3c61ab4678 Operational procedures guide

This is the operational procedures guide that HPE used
to operate and monitor their public Swift systems.
It has been made publicly available.

Change-Id: Iefb484893056d28beb69265d99ba30c3c84add2b

2016-03-03 11:49:26 +00:00

2.3 KiB

Raw Blame History

Swift Ops Runbook

This document contains operational procedures that Hewlett Packard Enterprise (HPE) uses to operate and monitor the Swift system within the HPE Helion Public Cloud. This document is an excerpt of a larger product-specific handbook. As such, the material may appear incomplete. The suggestions and recommendations made in this document are for our particular environment, and may not be suitable for your environment or situation. We make no representations concerning the accuracy, adequacy, completeness or suitability of the information, suggestions or recommendations. This document are provided for reference only. We are not responsible for your use of any information, suggestions or recommendations contained herein.

This document also contains references to certain tools that we use to operate the Swift system within the HPE Helion Public Cloud. Descriptions of these tools are provided for reference only, as the tools themselves are not publically available at this time.

swift-direct: This is similar to the swiftly tool.

general.rst diagnose.rst procedures.rst maintenance.rst troubleshooting.rst

Is the system up?

If you have a report that Swift is down, perform the following basic checks:

Run swift functional tests.
From a server in your data center, use curl to check /healthcheck.
If you have a monitoring system, check your monitoring system.
Check on your hardware load balancers infrastructure.
Run swift-recon on a proxy node.

Run swift function tests

We would recommend that you set up your function tests against your production system.

A script for running the function tests is located in swift/.functests.

External monitoring

We use pingdom.com to monitor the external Swift API. We suggest the following:
- Do a GET on /healthcheck
- Create a container, make it public (x-container-read: .r*,.rlistings), create a small file in the container; do a GET on the object

Reference information

Reference: Swift startup/shutdown

Use reload - not stop/start/restart.
Try to roll sets of servers (especially proxy) in groups of less than 20% of your servers.

2.3 KiB Raw Blame History