 8e651a2d3d
			
		
	
	8e651a2d3d
	
	
	
		
			
			The object server can be configured to leave a certain amount of disk space free; default is 1%. This is useful in avoiding 100%-full filesystems, as those can get Swift in a state where the filesystem is too full to write tombstones, so you can't delete objects to free up space. When a cluster has accounts/containers and objects on the same disks, then you can wind up with a 100%-full disk since account and container servers don't respect fallocate_reserve. This commit makes account and container servers respect fallocate_reserve so that disks shared between account/container and object rings won't get 100% full. When a disk's free space falls below the configured reserve, account and container PUT, POST, and REPLICATE requests will fail with a 507 status code. These are the operations that can significantly increase the disk space used by a given database. I called the parameter "fallocate_reserve" for consistency with the object server. No actual fallocate() call happens under Swift's control in the account or container servers (sqlite3 might make such a call, but it's out of our hands). Change-Id: I083442eef14bf83c0ea717b1decb3e6b56dbf1d0
		
			
				
	
	
		
			1515 lines
		
	
	
		
			79 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			1515 lines
		
	
	
		
			79 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| =====================
 | ||
| Administrator's Guide
 | ||
| =====================
 | ||
| 
 | ||
| -------------------------
 | ||
| Defining Storage Policies
 | ||
| -------------------------
 | ||
| 
 | ||
| Defining your Storage Policies is very easy to do with Swift.  It is important
 | ||
| that the administrator understand the concepts behind Storage Policies
 | ||
| before actually creating and using them in order to get the most benefit out
 | ||
| of the feature and, more importantly, to avoid having to make unnecessary changes
 | ||
| once a set of policies have been deployed to a cluster.
 | ||
| 
 | ||
| It is highly recommended that the reader fully read and comprehend
 | ||
| :doc:`overview_policies` before proceeding with administration of
 | ||
| policies.  Plan carefully and it is suggested that experimentation be
 | ||
| done first on a non-production cluster to be certain that the desired
 | ||
| configuration meets the needs of the users.  See :ref:`upgrade-policy`
 | ||
| before planning the upgrade of your existing deployment.
 | ||
| 
 | ||
| Following is a high level view of the very few steps it takes to configure
 | ||
| policies once you have decided what you want to do:
 | ||
| 
 | ||
| #. Define your policies in ``/etc/swift/swift.conf``
 | ||
| #. Create the corresponding object rings
 | ||
| #. Communicate the names of the Storage Policies to cluster users
 | ||
| 
 | ||
| For a specific example that takes you through these steps, please see
 | ||
| :doc:`policies_saio`
 | ||
| 
 | ||
| ------------------
 | ||
| Managing the Rings
 | ||
| ------------------
 | ||
| 
 | ||
| You may build the storage rings on any server with the appropriate
 | ||
| version of Swift installed.  Once built or changed (rebalanced), you
 | ||
| must distribute the rings to all the servers in the cluster.  Storage
 | ||
| rings contain information about all the Swift storage partitions and
 | ||
| how they are distributed between the different nodes and disks.
 | ||
| 
 | ||
| Swift 1.6.0 is the last version to use a Python pickle format.
 | ||
| Subsequent versions use a different serialization format.  **Rings
 | ||
| generated by Swift versions 1.6.0 and earlier may be read by any
 | ||
| version, but rings generated after 1.6.0 may only be read by Swift
 | ||
| versions greater than 1.6.0.**  So when upgrading from version 1.6.0 or
 | ||
| earlier to a version greater than 1.6.0, either upgrade Swift on your
 | ||
| ring building server **last** after all Swift nodes have been successfully
 | ||
| upgraded, or refrain from generating rings until all Swift nodes have
 | ||
| been successfully upgraded.
 | ||
| 
 | ||
| If you need to downgrade from a version of Swift greater than 1.6.0 to
 | ||
| a version less than or equal to 1.6.0, first downgrade your ring-building
 | ||
| server, generate new rings, push them out, then continue with the rest
 | ||
| of the downgrade.
 | ||
| 
 | ||
| For more information see :doc:`overview_ring`.
 | ||
| 
 | ||
| .. highlight:: none
 | ||
| 
 | ||
| Removing a device from the ring::
 | ||
| 
 | ||
|     swift-ring-builder <builder-file> remove <ip_address>/<device_name>
 | ||
| 
 | ||
| Removing a server from the ring::
 | ||
| 
 | ||
|     swift-ring-builder <builder-file> remove <ip_address>
 | ||
| 
 | ||
| Adding devices to the ring:
 | ||
| 
 | ||
| See :ref:`ring-preparing`
 | ||
| 
 | ||
| See what devices for a server are in the ring::
 | ||
| 
 | ||
|     swift-ring-builder <builder-file> search <ip_address>
 | ||
| 
 | ||
| Once you are done with all changes to the ring, the changes need to be
 | ||
| "committed"::
 | ||
| 
 | ||
|     swift-ring-builder <builder-file> rebalance
 | ||
| 
 | ||
| Once the new rings are built, they should be pushed out to all the servers
 | ||
| in the cluster.
 | ||
| 
 | ||
| Optionally, if invoked as 'swift-ring-builder-safe' the directory containing
 | ||
| the specified builder file will be locked (via a .lock file in the parent
 | ||
| directory). This provides a basic safe guard against multiple instances
 | ||
| of the swift-ring-builder (or other utilities that observe this lock) from
 | ||
| attempting to write to or read the builder/ring files while operations are in
 | ||
| progress. This can be useful in environments where ring management has been
 | ||
| automated but the operator still needs to interact with the rings manually.
 | ||
| 
 | ||
| If the ring builder is not producing the balances that you are
 | ||
| expecting, you can gain visibility into what it's doing with the
 | ||
| ``--debug`` flag.::
 | ||
| 
 | ||
|     swift-ring-builder <builder-file> rebalance --debug
 | ||
| 
 | ||
| This produces a great deal of output that is mostly useful if you are
 | ||
| either (a) attempting to fix the ring builder, or (b) filing a bug
 | ||
| against the ring builder.
 | ||
| 
 | ||
| You may notice in the rebalance output a 'dispersion' number. What this
 | ||
| number means is explained in :ref:`ring_dispersion` but in essence
 | ||
| is the percentage of partitions in the ring that have too many replicas
 | ||
| within a particular failure domain. You can ask 'swift-ring-builder' what
 | ||
| the dispersion is with::
 | ||
| 
 | ||
|   swift-ring-builder <builder-file> dispersion
 | ||
| 
 | ||
| This will give you the percentage again, if you want a detailed view of
 | ||
| the dispersion simply add a ``--verbose``::
 | ||
| 
 | ||
|   swift-ring-builder <builder-file> dispersion --verbose
 | ||
| 
 | ||
| This will not only display the percentage but will also display a dispersion
 | ||
| table that lists partition dispersion by tier. You can use this table to figure
 | ||
| out were you need to add capacity or to help tune an :ref:`ring_overload` value.
 | ||
| 
 | ||
| Now let's take an example with 1 region, 3 zones and 4 devices. Each device has
 | ||
| the same weight, and the ``dispersion --verbose`` might show the following::
 | ||
| 
 | ||
|   Dispersion is 16.666667, Balance is 0.000000, Overload is 0.00%
 | ||
|   Required overload is 33.333333%
 | ||
|   Worst tier is 33.333333 (r1z3)
 | ||
|   --------------------------------------------------------------------------
 | ||
|   Tier                           Parts      %    Max     0     1     2     3
 | ||
|   --------------------------------------------------------------------------
 | ||
|   r1                               768   0.00      3     0     0     0   256
 | ||
|   r1z1                             192   0.00      1    64   192     0     0
 | ||
|   r1z1-127.0.0.1                   192   0.00      1    64   192     0     0
 | ||
|   r1z1-127.0.0.1/sda               192   0.00      1    64   192     0     0
 | ||
|   r1z2                             192   0.00      1    64   192     0     0
 | ||
|   r1z2-127.0.0.2                   192   0.00      1    64   192     0     0
 | ||
|   r1z2-127.0.0.2/sda               192   0.00      1    64   192     0     0
 | ||
|   r1z3                             384  33.33      1     0   128   128     0
 | ||
|   r1z3-127.0.0.3                   384  33.33      1     0   128   128     0
 | ||
|   r1z3-127.0.0.3/sda               192   0.00      1    64   192     0     0
 | ||
|   r1z3-127.0.0.3/sdb               192   0.00      1    64   192     0     0
 | ||
| 
 | ||
| The first line reports that there are 256 partitions with 3 copies in region 1;
 | ||
| and this is an expected output in this case (single region with 3 replicas) as
 | ||
| reported by the "Max" value.
 | ||
| 
 | ||
| However, there is some imbalance in the cluster, more precisely in zone 3. The
 | ||
| "Max" reports a maximum of 1 copy in this zone; however 50.00% of the partitions
 | ||
| are storing 2 replicas in this zone (which is somewhat expected, because there
 | ||
| are more disks in this zone).
 | ||
| 
 | ||
| You can now either add more capacity to the other zones, decrease the total
 | ||
| weight in zone 3 or set the overload to a value `greater than` 33.333333% -
 | ||
| only as much overload as needed will be used.
 | ||
| 
 | ||
| -----------------------
 | ||
| Scripting Ring Creation
 | ||
| -----------------------
 | ||
| You can create scripts to create the account and container rings and rebalance. Here's an example script for the Account ring. Use similar commands to create a make-container-ring.sh script on the proxy server node.
 | ||
| 
 | ||
| 1. Create a script file called make-account-ring.sh on the proxy
 | ||
|    server node with the following content::
 | ||
| 
 | ||
|     #!/bin/bash
 | ||
|     cd /etc/swift
 | ||
|     rm -f account.builder account.ring.gz backups/account.builder backups/account.ring.gz
 | ||
|     swift-ring-builder account.builder create 18 3 1
 | ||
|     swift-ring-builder account.builder add r1z1-<account-server-1>:6202/sdb1 1
 | ||
|     swift-ring-builder account.builder add r1z2-<account-server-2>:6202/sdb1 1
 | ||
|     swift-ring-builder account.builder rebalance
 | ||
| 
 | ||
|    You need to replace the values of <account-server-1>,
 | ||
|    <account-server-2>, etc. with the IP addresses of the account
 | ||
|    servers used in your setup. You can have as many account servers as
 | ||
|    you need. All account servers are assumed to be listening on port
 | ||
|    6202, and have a storage device called "sdb1" (this is a directory
 | ||
|    name created under /drives when we setup the account server). The
 | ||
|    "z1", "z2", etc. designate zones, and you can choose whether you
 | ||
|    put devices in the same or different zones. The "r1" designates
 | ||
|    the region, with different regions specified as "r1", "r2", etc.
 | ||
| 
 | ||
| 2. Make the script file executable and run it to create the account ring file::
 | ||
| 
 | ||
|     chmod +x make-account-ring.sh
 | ||
|     sudo ./make-account-ring.sh
 | ||
| 
 | ||
| 3. Copy the resulting ring file /etc/swift/account.ring.gz to all the
 | ||
|    account server nodes in your Swift environment, and put them in the
 | ||
|    /etc/swift directory on these nodes. Make sure that every time you
 | ||
|    change the account ring configuration, you copy the resulting ring
 | ||
|    file to all the account nodes.
 | ||
| 
 | ||
| -----------------------
 | ||
| Handling System Updates
 | ||
| -----------------------
 | ||
| 
 | ||
| It is recommended that system updates and reboots are done a zone at a time.
 | ||
| This allows the update to happen, and for the Swift cluster to stay available
 | ||
| and responsive to requests.  It is also advisable when updating a zone, let
 | ||
| it run for a while before updating the other zones to make sure the update
 | ||
| doesn't have any adverse effects.
 | ||
| 
 | ||
| ----------------------
 | ||
| Handling Drive Failure
 | ||
| ----------------------
 | ||
| 
 | ||
| In the event that a drive has failed, the first step is to make sure the drive
 | ||
| is unmounted.  This will make it easier for Swift to work around the failure
 | ||
| until it has been resolved.  If the drive is going to be replaced immediately,
 | ||
| then it is just best to replace the drive, format it, remount it, and let
 | ||
| replication fill it up.
 | ||
| 
 | ||
| After the drive is unmounted, make sure the mount point is owned by root
 | ||
| (root:root 755). This ensures that rsync will not try to replicate into the
 | ||
| root drive once the failed drive is unmounted.
 | ||
| 
 | ||
| If the drive can't be replaced immediately, then it is best to leave it
 | ||
| unmounted, and set the device weight to 0. This will allow all the
 | ||
| replicas that were on that drive to be replicated elsewhere until the drive
 | ||
| is replaced. Once the drive is replaced, the device weight can be increased
 | ||
| again. Setting the device weight to 0 instead of removing the drive from the
 | ||
| ring gives Swift the chance to replicate data from the failing disk too (in case
 | ||
| it is still possible to read some of the data).
 | ||
| 
 | ||
| Setting the device weight to 0 (or removing a failed drive from the ring) has
 | ||
| another benefit: all partitions that were stored on the failed drive are
 | ||
| distributed over the remaining disks in the cluster, and each disk only needs to
 | ||
| store a few new partitions. This is much faster compared to replicating all
 | ||
| partitions to a single, new disk. It decreases the time to recover from a
 | ||
| degraded number of replicas significantly, and becomes more and more important
 | ||
| with bigger disks.
 | ||
| 
 | ||
| -----------------------
 | ||
| Handling Server Failure
 | ||
| -----------------------
 | ||
| 
 | ||
| If a server is having hardware issues, it is a good idea to make sure the
 | ||
| Swift services are not running.  This will allow Swift to work around the
 | ||
| failure while you troubleshoot.
 | ||
| 
 | ||
| If the server just needs a reboot, or a small amount of work that should
 | ||
| only last a couple of hours, then it is probably best to let Swift work
 | ||
| around the failure and get the machine fixed and back online.  When the
 | ||
| machine comes back online, replication will make sure that anything that is
 | ||
| missing during the downtime will get updated.
 | ||
| 
 | ||
| If the server has more serious issues, then it is probably best to remove
 | ||
| all of the server's devices from the ring.  Once the server has been repaired
 | ||
| and is back online, the server's devices can be added back into the ring.
 | ||
| It is important that the devices are reformatted before putting them back
 | ||
| into the ring as it is likely to be responsible for a different set of
 | ||
| partitions than before.
 | ||
| 
 | ||
| -----------------------
 | ||
| Detecting Failed Drives
 | ||
| -----------------------
 | ||
| 
 | ||
| It has been our experience that when a drive is about to fail, error messages
 | ||
| will spew into `/var/log/kern.log`.  There is a script called
 | ||
| `swift-drive-audit` that can be run via cron to watch for bad drives.  If
 | ||
| errors are detected, it will unmount the bad drive, so that Swift can
 | ||
| work around it.  The script takes a configuration file with the following
 | ||
| settings:
 | ||
| 
 | ||
| ``[drive-audit]``
 | ||
| 
 | ||
| ==================  ==============  ===========================================
 | ||
| Option              Default         Description
 | ||
| ------------------  --------------  -------------------------------------------
 | ||
| user                swift           Drop privileges to this user for non-root
 | ||
|                                     tasks
 | ||
| log_facility        LOG_LOCAL0      Syslog log facility
 | ||
| log_level           INFO            Log level
 | ||
| device_dir          /srv/node       Directory devices are mounted under
 | ||
| minutes             60              Number of minutes to look back in
 | ||
|                                     `/var/log/kern.log`
 | ||
| error_limit         1               Number of errors to find before a device
 | ||
|                                     is unmounted
 | ||
| log_file_pattern    /var/log/kern*  Location of the log file with globbing
 | ||
|                                     pattern to check against device errors
 | ||
| regex_pattern_X     (see below)     Regular expression patterns to be used to
 | ||
|                                     locate device blocks with errors in the
 | ||
|                                     log file
 | ||
| ==================  ==============  ===========================================
 | ||
| 
 | ||
| The default regex pattern used to locate device blocks with errors are
 | ||
| `\berror\b.*\b(sd[a-z]{1,2}\d?)\b` and `\b(sd[a-z]{1,2}\d?)\b.*\berror\b`.
 | ||
| One is able to overwrite the default above by providing new expressions
 | ||
| using the format `regex_pattern_X = regex_expression`, where `X` is a number.
 | ||
| 
 | ||
| This script has been tested on Ubuntu 10.04 and Ubuntu 12.04, so if you are
 | ||
| using a different distro or OS, some care should be taken before using in production.
 | ||
| 
 | ||
| ------------------------------
 | ||
| Preventing Disk Full Scenarios
 | ||
| ------------------------------
 | ||
| 
 | ||
| .. highlight:: cfg
 | ||
| 
 | ||
| Prevent disk full scenarios by ensuring that the ``proxy-server`` blocks PUT
 | ||
| requests and rsync prevents replication to the specific drives.
 | ||
| 
 | ||
| You can prevent `proxy-server` PUT requests to low space disks by
 | ||
| ensuring ``fallocate_reserve`` is set in ``account-server.conf``,
 | ||
| ``container-server.conf``, and ``object-server.conf``. By default,
 | ||
| ``fallocate_reserve`` is set to 1%. In the object server, this blocks
 | ||
| PUT requests that would leave the free disk space below 1% of the
 | ||
| disk. In the account and container servers, this blocks operations
 | ||
| that will increase account or container database size once the free
 | ||
| disk space falls below 1%.
 | ||
| 
 | ||
| Setting ``fallocate_reserve`` is highly recommended to avoid filling
 | ||
| disks to 100%. When Swift's disks are completely full, all requests
 | ||
| involving those disks will fail, including DELETE requests that would
 | ||
| otherwise free up space. This is because object deletion includes the
 | ||
| creation of a zero-byte tombstone (.ts) to record the time of the
 | ||
| deletion for replication purposes; this happens prior to deletion of
 | ||
| the object's data. On a completely-full filesystem, that zero-byte .ts
 | ||
| file cannot be created, so the DELETE request will fail and the disk
 | ||
| will remain completely full. If ``fallocate_reserve`` is set, then the
 | ||
| filesystem will have enough space to create the zero-byte .ts file,
 | ||
| and thus the deletion of the object will succeed and free up some
 | ||
| space.
 | ||
| 
 | ||
| In order to prevent rsync replication to specific drives, firstly
 | ||
| setup ``rsync_module`` per disk in your ``object-replicator``.
 | ||
| Set this in ``object-server.conf``:
 | ||
| 
 | ||
| .. code::
 | ||
| 
 | ||
|     [object-replicator]
 | ||
|     rsync_module = {replication_ip}::object_{device}
 | ||
| 
 | ||
| Set the individual drives in ``rsync.conf``. For example:
 | ||
| 
 | ||
| .. code::
 | ||
| 
 | ||
|     [object_sda]
 | ||
|     max connections = 4
 | ||
|     lock file = /var/lock/object_sda.lock
 | ||
| 
 | ||
|     [object_sdb]
 | ||
|     max connections = 4
 | ||
|     lock file = /var/lock/object_sdb.lock
 | ||
| 
 | ||
| Finally, monitor the disk space of each disk and adjust the rsync
 | ||
| ``max connections`` per drive to ``-1``. We recommend utilising your existing
 | ||
| monitoring solution to achieve this. The following is an example script:
 | ||
| 
 | ||
| .. code-block:: python
 | ||
| 
 | ||
|     #!/usr/bin/env python
 | ||
|     import os
 | ||
|     import errno
 | ||
| 
 | ||
|     RESERVE = 500 * 2 ** 20  # 500 MiB
 | ||
| 
 | ||
|     DEVICES = '/srv/node1'
 | ||
| 
 | ||
|     path_template = '/etc/rsync.d/disable_%s.conf'
 | ||
|     config_template = '''
 | ||
|     [object_%s]
 | ||
|     max connections = -1
 | ||
|     '''
 | ||
| 
 | ||
|     def disable_rsync(device):
 | ||
|         with open(path_template % device, 'w') as f:
 | ||
|             f.write(config_template.lstrip() % device)
 | ||
| 
 | ||
| 
 | ||
|     def enable_rsync(device):
 | ||
|         try:
 | ||
|             os.unlink(path_template % device)
 | ||
|         except OSError as e:
 | ||
|             # ignore file does not exist
 | ||
|             if e.errno != errno.ENOENT:
 | ||
|                 raise
 | ||
| 
 | ||
| 
 | ||
|     for device in os.listdir(DEVICES):
 | ||
|         path = os.path.join(DEVICES, device)
 | ||
|         st = os.statvfs(path)
 | ||
|         free = st.f_bavail * st.f_frsize
 | ||
|         if free < RESERVE:
 | ||
|             disable_rsync(device)
 | ||
|         else:
 | ||
|             enable_rsync(device)
 | ||
| 
 | ||
| For the above script to work, ensure ``/etc/rsync.d/`` conf files are
 | ||
| included, by specifying ``&include`` in your ``rsync.conf`` file:
 | ||
| 
 | ||
| .. code::
 | ||
| 
 | ||
|     &include /etc/rsync.d
 | ||
| 
 | ||
| Use this in conjunction with a cron job to periodically run the script, for example:
 | ||
| 
 | ||
| .. highlight:: none
 | ||
| 
 | ||
| .. code::
 | ||
| 
 | ||
|     # /etc/cron.d/devicecheck
 | ||
|     * * * * * root /some/path/to/disable_rsync.py
 | ||
| 
 | ||
| .. _dispersion_report:
 | ||
| 
 | ||
| -----------------
 | ||
| Dispersion Report
 | ||
| -----------------
 | ||
| 
 | ||
| There is a swift-dispersion-report tool for measuring overall cluster health.
 | ||
| This is accomplished by checking if a set of deliberately distributed
 | ||
| containers and objects are currently in their proper places within the cluster.
 | ||
| 
 | ||
| For instance, a common deployment has three replicas of each object. The health
 | ||
| of that object can be measured by checking if each replica is in its proper
 | ||
| place. If only 2 of the 3 is in place the object's heath can be said to be at
 | ||
| 66.66%, where 100% would be perfect.
 | ||
| 
 | ||
| A single object's health, especially an older object, usually reflects the
 | ||
| health of that entire partition the object is in. If we make enough objects on
 | ||
| a distinct percentage of the partitions in the cluster, we can get a pretty
 | ||
| valid estimate of the overall cluster health. In practice, about 1% partition
 | ||
| coverage seems to balance well between accuracy and the amount of time it takes
 | ||
| to gather results.
 | ||
| 
 | ||
| The first thing that needs to be done to provide this health value is create a
 | ||
| new account solely for this usage. Next, we need to place the containers and
 | ||
| objects throughout the system so that they are on distinct partitions. The
 | ||
| swift-dispersion-populate tool does this by making up random container and
 | ||
| object names until they fall on distinct partitions. Last, and repeatedly for
 | ||
| the life of the cluster, we need to run the swift-dispersion-report tool to
 | ||
| check the health of each of these containers and objects.
 | ||
| 
 | ||
| .. highlight:: cfg
 | ||
| 
 | ||
| These tools need direct access to the entire cluster and to the ring files
 | ||
| (installing them on a proxy server will probably do). Both
 | ||
| swift-dispersion-populate and swift-dispersion-report use the same
 | ||
| configuration file, /etc/swift/dispersion.conf. Example conf file::
 | ||
| 
 | ||
|     [dispersion]
 | ||
|     auth_url = http://localhost:8080/auth/v1.0
 | ||
|     auth_user = test:tester
 | ||
|     auth_key = testing
 | ||
|     endpoint_type = internalURL
 | ||
| 
 | ||
| .. highlight:: none
 | ||
| 
 | ||
| There are also options for the conf file for specifying the dispersion coverage
 | ||
| (defaults to 1%), retries, concurrency, etc. though usually the defaults are
 | ||
| fine. If you want to use keystone v3 for authentication there are options like
 | ||
| auth_version, user_domain_name, project_domain_name and project_name.
 | ||
| 
 | ||
| Once the configuration is in place, run `swift-dispersion-populate` to populate
 | ||
| the containers and objects throughout the cluster.
 | ||
| 
 | ||
| Now that those containers and objects are in place, you can run
 | ||
| `swift-dispersion-report` to get a dispersion report, or the overall health of
 | ||
| the cluster. Here is an example of a cluster in perfect health::
 | ||
| 
 | ||
|     $ swift-dispersion-report
 | ||
|     Queried 2621 containers for dispersion reporting, 19s, 0 retries
 | ||
|     100.00% of container copies found (7863 of 7863)
 | ||
|     Sample represents 1.00% of the container partition space
 | ||
| 
 | ||
|     Queried 2619 objects for dispersion reporting, 7s, 0 retries
 | ||
|     100.00% of object copies found (7857 of 7857)
 | ||
|     Sample represents 1.00% of the object partition space
 | ||
| 
 | ||
| Now I'll deliberately double the weight of a device in the object ring (with
 | ||
| replication turned off) and rerun the dispersion report to show what impact
 | ||
| that has::
 | ||
| 
 | ||
|     $ swift-ring-builder object.builder set_weight d0 200
 | ||
|     $ swift-ring-builder object.builder rebalance
 | ||
|     ...
 | ||
|     $ swift-dispersion-report
 | ||
|     Queried 2621 containers for dispersion reporting, 8s, 0 retries
 | ||
|     100.00% of container copies found (7863 of 7863)
 | ||
|     Sample represents 1.00% of the container partition space
 | ||
| 
 | ||
|     Queried 2619 objects for dispersion reporting, 7s, 0 retries
 | ||
|     There were 1763 partitions missing one copy.
 | ||
|     77.56% of object copies found (6094 of 7857)
 | ||
|     Sample represents 1.00% of the object partition space
 | ||
| 
 | ||
| You can see the health of the objects in the cluster has gone down
 | ||
| significantly. Of course, I only have four devices in this test environment, in
 | ||
| a production environment with many many devices the impact of one device change
 | ||
| is much less. Next, I'll run the replicators to get everything put back into
 | ||
| place and then rerun the dispersion report::
 | ||
| 
 | ||
|     ... start object replicators and monitor logs until they're caught up ...
 | ||
|     $ swift-dispersion-report
 | ||
|     Queried 2621 containers for dispersion reporting, 17s, 0 retries
 | ||
|     100.00% of container copies found (7863 of 7863)
 | ||
|     Sample represents 1.00% of the container partition space
 | ||
| 
 | ||
|     Queried 2619 objects for dispersion reporting, 7s, 0 retries
 | ||
|     100.00% of object copies found (7857 of 7857)
 | ||
|     Sample represents 1.00% of the object partition space
 | ||
| 
 | ||
| You can also run the report for only containers or objects::
 | ||
| 
 | ||
|     $ swift-dispersion-report --container-only
 | ||
|     Queried 2621 containers for dispersion reporting, 17s, 0 retries
 | ||
|     100.00% of container copies found (7863 of 7863)
 | ||
|     Sample represents 1.00% of the container partition space
 | ||
| 
 | ||
|     $ swift-dispersion-report --object-only
 | ||
|     Queried 2619 objects for dispersion reporting, 7s, 0 retries
 | ||
|     100.00% of object copies found (7857 of 7857)
 | ||
|     Sample represents 1.00% of the object partition space
 | ||
| 
 | ||
| Alternatively, the dispersion report can also be output in JSON format. This
 | ||
| allows it to be more easily consumed by third party utilities::
 | ||
| 
 | ||
|     $ swift-dispersion-report -j
 | ||
|     {"object": {"retries:": 0, "missing_two": 0, "copies_found": 7863, "missing_one": 0, "copies_expected": 7863, "pct_found": 100.0, "overlapping": 0, "missing_all": 0}, "container": {"retries:": 0, "missing_two": 0, "copies_found": 12534, "missing_one": 0, "copies_expected": 12534, "pct_found": 100.0, "overlapping": 15, "missing_all": 0}}
 | ||
| 
 | ||
| Note that you may select which storage policy to use by setting the option
 | ||
| '--policy-name silver' or '-P silver' (silver is the example policy name here).
 | ||
| If no policy is specified, the default will be used per the swift.conf file.
 | ||
| When you specify a policy the containers created also include the policy index,
 | ||
| thus even when running a container_only report, you will need to specify the
 | ||
| policy not using the default.
 | ||
| 
 | ||
| -----------------------------------------------
 | ||
| Geographically Distributed Swift Considerations
 | ||
| -----------------------------------------------
 | ||
| 
 | ||
| Swift provides two features that may be used to distribute replicas of objects
 | ||
| across multiple geographically distributed data-centers: with
 | ||
| :doc:`overview_global_cluster` object replicas may be dispersed across devices
 | ||
| from different data-centers by using `regions` in ring device descriptors; with
 | ||
| :doc:`overview_container_sync` objects may be copied between independent Swift
 | ||
| clusters in each data-center. The operation and configuration of each are
 | ||
| described in their respective documentation. The following points should be
 | ||
| considered when selecting the feature that is most appropriate for a particular
 | ||
| use case:
 | ||
| 
 | ||
| #. Global Clusters allows the distribution of object replicas across
 | ||
|    data-centers to be controlled by the cluster operator on per-policy basis,
 | ||
|    since the distribution is determined by the assignment of devices from
 | ||
|    each data-center in each policy's ring file. With Container Sync the end
 | ||
|    user controls the distribution of objects across clusters on a
 | ||
|    per-container basis.
 | ||
| 
 | ||
| #. Global Clusters requires an operator to coordinate ring deployments across
 | ||
|    multiple data-centers. Container Sync allows for independent management of
 | ||
|    separate Swift clusters in each data-center, and for existing Swift
 | ||
|    clusters to be used as peers in Container Sync relationships without
 | ||
|    deploying new policies/rings.
 | ||
| 
 | ||
| #. Global Clusters seamlessly supports features that may rely on
 | ||
|    cross-container operations such as large objects and versioned writes.
 | ||
|    Container Sync requires the end user to ensure that all required
 | ||
|    containers are sync'd for these features to work in all data-centers.
 | ||
| 
 | ||
| #. Global Clusters makes objects available for GET or HEAD requests in both
 | ||
|    data-centers even if a replica of the object has not yet been
 | ||
|    asynchronously migrated between data-centers, by forwarding requests
 | ||
|    between data-centers. Container Sync is unable to serve requests for an
 | ||
|    object in a particular data-center until the asynchronous sync process has
 | ||
|    copied the object to that data-center.
 | ||
| 
 | ||
| #. Global Clusters may require less storage capacity than Container Sync to
 | ||
|    achieve equivalent durability of objects in each data-center. Global
 | ||
|    Clusters can restore replicas that are lost or corrupted in one
 | ||
|    data-center using replicas from other data-centers. Container Sync
 | ||
|    requires each data-center to independently manage the durability of
 | ||
|    objects, which may result in each data-center storing more replicas than
 | ||
|    with Global Clusters.
 | ||
| 
 | ||
| #. Global Clusters execute all account/container metadata updates
 | ||
|    synchronously to account/container replicas in all data-centers, which may
 | ||
|    incur delays when making updates across WANs. Container Sync only copies
 | ||
|    objects between data-centers and all Swift internal traffic is
 | ||
|    confined to each data-center.
 | ||
| 
 | ||
| #. Global Clusters does not yet guarantee the availability of objects stored
 | ||
|    in Erasure Coded policies when one data-center is offline. With Container
 | ||
|    Sync the availability of objects in each data-center is independent of the
 | ||
|    state of other data-centers once objects have been synced. Container Sync
 | ||
|    also allows objects to be stored using different policy types in different
 | ||
|    data-centers.
 | ||
| 
 | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 | ||
| Checking handoff partition distribution
 | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 | ||
| 
 | ||
| You can check if handoff partitions are piling up on a server by
 | ||
| comparing the expected number of partitions with the actual number on
 | ||
| your disks. First get the number of partitions that are currently
 | ||
| assigned to a server using the ``dispersion`` command from
 | ||
| ``swift-ring-builder``::
 | ||
| 
 | ||
|     swift-ring-builder sample.builder dispersion --verbose
 | ||
|     Dispersion is 0.000000, Balance is 0.000000, Overload is 0.00%
 | ||
|     Required overload is 0.000000%
 | ||
|     --------------------------------------------------------------------------
 | ||
|     Tier                           Parts      %    Max     0     1     2     3
 | ||
|     --------------------------------------------------------------------------
 | ||
|     r1                              8192   0.00      2     0     0  8192     0
 | ||
|     r1z1                            4096   0.00      1  4096  4096     0     0
 | ||
|     r1z1-172.16.10.1                4096   0.00      1  4096  4096     0     0
 | ||
|     r1z1-172.16.10.1/sda1           4096   0.00      1  4096  4096     0     0
 | ||
|     r1z2                            4096   0.00      1  4096  4096     0     0
 | ||
|     r1z2-172.16.10.2                4096   0.00      1  4096  4096     0     0
 | ||
|     r1z2-172.16.10.2/sda1           4096   0.00      1  4096  4096     0     0
 | ||
|     r1z3                            4096   0.00      1  4096  4096     0     0
 | ||
|     r1z3-172.16.10.3                4096   0.00      1  4096  4096     0     0
 | ||
|     r1z3-172.16.10.3/sda1           4096   0.00      1  4096  4096     0     0
 | ||
|     r1z4                            4096   0.00      1  4096  4096     0     0
 | ||
|     r1z4-172.16.20.4                4096   0.00      1  4096  4096     0     0
 | ||
|     r1z4-172.16.20.4/sda1           4096   0.00      1  4096  4096     0     0
 | ||
|     r2                              8192   0.00      2     0  8192     0     0
 | ||
|     r2z1                            4096   0.00      1  4096  4096     0     0
 | ||
|     r2z1-172.16.20.1                4096   0.00      1  4096  4096     0     0
 | ||
|     r2z1-172.16.20.1/sda1           4096   0.00      1  4096  4096     0     0
 | ||
|     r2z2                            4096   0.00      1  4096  4096     0     0
 | ||
|     r2z2-172.16.20.2                4096   0.00      1  4096  4096     0     0
 | ||
|     r2z2-172.16.20.2/sda1           4096   0.00      1  4096  4096     0     0
 | ||
| 
 | ||
| As you can see from the output, each server should store 4096 partitions, and
 | ||
| each region should store 8192 partitions. This example used a partition power
 | ||
| of 13 and 3 replicas.
 | ||
| 
 | ||
| With write_affinity enabled it is expected to have a higher number of
 | ||
| partitions on disk compared to the value reported by the
 | ||
| swift-ring-builder dispersion command. The number of additional (handoff)
 | ||
| partitions in region r1 depends on your cluster size, the amount
 | ||
| of incoming data as well as the replication speed.
 | ||
| 
 | ||
| Let's use the example from above with 6 nodes in 2 regions, and write_affinity
 | ||
| configured to write to region r1 first. `swift-ring-builder` reported that
 | ||
| each node should store 4096 partitions::
 | ||
| 
 | ||
|  Expected partitions for region r2:                                      8192
 | ||
|  Handoffs stored across 4 nodes in region r1:                 8192 / 4 = 2048
 | ||
|  Maximum number of partitions on each server in region r1: 2048 + 4096 = 6144
 | ||
| 
 | ||
| Worst case is that handoff partitions in region 1 are populated with new
 | ||
| object replicas faster than replication is able to move them to region 2.
 | ||
| In that case you will see ~ 6144 partitions per
 | ||
| server in region r1. Your actual number should be lower and
 | ||
| between 4096 and 6144 partitions (preferably on the lower side).
 | ||
| 
 | ||
| Now count the number of object partitions on a given server in region 1,
 | ||
| for example on 172.16.10.1.  Note that the pathnames might be
 | ||
| different; `/srv/node/` is the default mount location, and `objects`
 | ||
| applies only to storage policy 0 (storage policy 1 would use
 | ||
| `objects-1` and so on)::
 | ||
| 
 | ||
|     find -L /srv/node/ -maxdepth 3 -type d -wholename "*objects/*" | wc -l
 | ||
| 
 | ||
| If this number is always on the upper end of the expected partition
 | ||
| number range (4096 to 6144) or increasing you should check your
 | ||
| replication speed and maybe even disable write_affinity.
 | ||
| Please refer to the next section how to collect metrics from Swift, and
 | ||
| especially :ref:`swift-recon -r <recon-replication>` how to check replication
 | ||
| stats.
 | ||
| 
 | ||
| 
 | ||
| .. _cluster_telemetry_and_monitoring:
 | ||
| 
 | ||
| --------------------------------
 | ||
| Cluster Telemetry and Monitoring
 | ||
| --------------------------------
 | ||
| 
 | ||
| Various metrics and telemetry can be obtained from the account, container, and
 | ||
| object servers using the recon server middleware and the swift-recon cli. To do
 | ||
| so update your account, container, or object servers pipelines to include recon
 | ||
| and add the associated filter config.
 | ||
| 
 | ||
| .. highlight:: cfg
 | ||
| 
 | ||
| object-server.conf sample::
 | ||
| 
 | ||
|     [pipeline:main]
 | ||
|     pipeline = recon object-server
 | ||
| 
 | ||
|     [filter:recon]
 | ||
|     use = egg:swift#recon
 | ||
|     recon_cache_path = /var/cache/swift
 | ||
| 
 | ||
| container-server.conf sample::
 | ||
| 
 | ||
|     [pipeline:main]
 | ||
|     pipeline = recon container-server
 | ||
| 
 | ||
|     [filter:recon]
 | ||
|     use = egg:swift#recon
 | ||
|     recon_cache_path = /var/cache/swift
 | ||
| 
 | ||
| account-server.conf sample::
 | ||
| 
 | ||
|     [pipeline:main]
 | ||
|     pipeline = recon account-server
 | ||
| 
 | ||
|     [filter:recon]
 | ||
|     use = egg:swift#recon
 | ||
|     recon_cache_path = /var/cache/swift
 | ||
| 
 | ||
| .. highlight:: none
 | ||
| 
 | ||
| The recon_cache_path simply sets the directory where stats for a few items will
 | ||
| be stored. Depending on the method of deployment you may need to create this
 | ||
| directory manually and ensure that Swift has read/write access.
 | ||
| 
 | ||
| Finally, if you also wish to track asynchronous pending on your object
 | ||
| servers you will need to setup a cronjob to run the swift-recon-cron script
 | ||
| periodically on your object servers::
 | ||
| 
 | ||
|     */5 * * * * swift /usr/bin/swift-recon-cron /etc/swift/object-server.conf
 | ||
| 
 | ||
| Once the recon middleware is enabled, a GET request for
 | ||
| "/recon/<metric>" to the backend object server will return a
 | ||
| JSON-formatted response::
 | ||
| 
 | ||
|     fhines@ubuntu:~$ curl -i http://localhost:6030/recon/async
 | ||
|     HTTP/1.1 200 OK
 | ||
|     Content-Type: application/json
 | ||
|     Content-Length: 20
 | ||
|     Date: Tue, 18 Oct 2011 21:03:01 GMT
 | ||
| 
 | ||
|     {"async_pending": 0}
 | ||
| 
 | ||
| 
 | ||
| Note that the default port for the object server is 6200, except on a
 | ||
| Swift All-In-One installation, which uses 6010, 6020, 6030, and 6040.
 | ||
| 
 | ||
| The following metrics and telemetry are currently exposed:
 | ||
| 
 | ||
| =========================   ========================================================================================
 | ||
| Request URI                 Description
 | ||
| -------------------------   ----------------------------------------------------------------------------------------
 | ||
| /recon/load                 returns 1,5, and 15 minute load average
 | ||
| /recon/mem                  returns /proc/meminfo
 | ||
| /recon/mounted              returns *ALL* currently mounted filesystems
 | ||
| /recon/unmounted            returns all unmounted drives if mount_check = True
 | ||
| /recon/diskusage            returns disk utilization for storage devices
 | ||
| /recon/driveaudit           returns # of drive audit errors
 | ||
| /recon/ringmd5              returns object/container/account ring md5sums
 | ||
| /recon/swiftconfmd5         returns swift.conf md5sum
 | ||
| /recon/quarantined          returns # of quarantined objects/accounts/containers
 | ||
| /recon/sockstat             returns consumable info from /proc/net/sockstat|6
 | ||
| /recon/devices              returns list of devices and devices dir i.e. /srv/node
 | ||
| /recon/async                returns count of async pending
 | ||
| /recon/replication          returns object replication info (for backward compatibility)
 | ||
| /recon/replication/<type>   returns replication info for given type (account, container, object)
 | ||
| /recon/auditor/<type>       returns auditor stats on last reported scan for given type (account, container, object)
 | ||
| /recon/updater/<type>       returns last updater sweep times for given type (container, object)
 | ||
| /recon/expirer/object       returns time elapsed and number of objects deleted during last object expirer sweep
 | ||
| /recon/version              returns Swift version
 | ||
| /recon/time                 returns node time
 | ||
| =========================   ========================================================================================
 | ||
| 
 | ||
| Note that 'object_replication_last' and 'object_replication_time' in object
 | ||
| replication info are considered to be transitional and will be removed in
 | ||
| the subsequent releases. Use 'replication_last' and 'replication_time' instead.
 | ||
| 
 | ||
| This information can also be queried via the swift-recon command line utility::
 | ||
| 
 | ||
|     fhines@ubuntu:~$ swift-recon -h
 | ||
|     Usage:
 | ||
|             usage: swift-recon <server_type> [-v] [--suppress] [-a] [-r] [-u] [-d]
 | ||
|             [-l] [-T] [--md5] [--auditor] [--updater] [--expirer] [--sockstat]
 | ||
| 
 | ||
|             <server_type>   account|container|object
 | ||
|             Defaults to object server.
 | ||
| 
 | ||
|             ex: swift-recon container -l --auditor
 | ||
| 
 | ||
| 
 | ||
|     Options:
 | ||
|       -h, --help            show this help message and exit
 | ||
|       -v, --verbose         Print verbose info
 | ||
|       --suppress            Suppress most connection related errors
 | ||
|       -a, --async           Get async stats
 | ||
|       -r, --replication     Get replication stats
 | ||
|       --auditor             Get auditor stats
 | ||
|       --updater             Get updater stats
 | ||
|       --expirer             Get expirer stats
 | ||
|       -u, --unmounted       Check cluster for unmounted devices
 | ||
|       -d, --diskusage       Get disk usage stats
 | ||
|       -l, --loadstats       Get cluster load average stats
 | ||
|       -q, --quarantined     Get cluster quarantine stats
 | ||
|       --md5                 Get md5sum of servers ring and compare to local copy
 | ||
|       --sockstat            Get cluster socket usage stats
 | ||
|       -T, --time            Check time synchronization
 | ||
|       --all                 Perform all checks. Equal to
 | ||
|                             -arudlqT --md5 --sockstat --auditor --updater
 | ||
|                             --expirer --driveaudit --validate-servers
 | ||
|       -z ZONE, --zone=ZONE  Only query servers in specified zone
 | ||
|       -t SECONDS, --timeout=SECONDS
 | ||
|                             Time to wait for a response from a server
 | ||
|       --swiftdir=SWIFTDIR   Default = /etc/swift
 | ||
| 
 | ||
| .. _recon-replication:
 | ||
| 
 | ||
| For example, to obtain container replication info from all hosts in zone "3"::
 | ||
| 
 | ||
|     fhines@ubuntu:~$ swift-recon container -r --zone 3
 | ||
|     ===============================================================================
 | ||
|     --> Starting reconnaissance on 1 hosts
 | ||
|     ===============================================================================
 | ||
|     [2012-04-02 02:45:48] Checking on replication
 | ||
|     [failure] low: 0.000, high: 0.000, avg: 0.000, reported: 1
 | ||
|     [success] low: 486.000, high: 486.000, avg: 486.000, reported: 1
 | ||
|     [replication_time] low: 20.853, high: 20.853, avg: 20.853, reported: 1
 | ||
|     [attempted] low: 243.000, high: 243.000, avg: 243.000, reported: 1
 | ||
| 
 | ||
| ---------------------------
 | ||
| Reporting Metrics to StatsD
 | ||
| ---------------------------
 | ||
| 
 | ||
| .. highlight:: cfg
 | ||
| 
 | ||
| If you have a StatsD_ server running, Swift may be configured to send it
 | ||
| real-time operational metrics.  To enable this, set the following
 | ||
| configuration entries (see the sample configuration files)::
 | ||
| 
 | ||
|     log_statsd_host = localhost
 | ||
|     log_statsd_port = 8125
 | ||
|     log_statsd_default_sample_rate = 1.0
 | ||
|     log_statsd_sample_rate_factor = 1.0
 | ||
|     log_statsd_metric_prefix =                [empty-string]
 | ||
| 
 | ||
| If `log_statsd_host` is not set, this feature is disabled.  The default values
 | ||
| for the other settings are given above.  The `log_statsd_host` can be a
 | ||
| hostname, an IPv4 address, or an IPv6 address (not surrounded with brackets, as
 | ||
| this is unnecessary since the port is specified separately).  If a hostname
 | ||
| resolves to an IPv4 address, an IPv4 socket will be used to send StatsD UDP
 | ||
| packets, even if the hostname would also resolve to an IPv6 address.
 | ||
| 
 | ||
| .. _StatsD: http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
 | ||
| .. _Graphite: http://graphiteapp.org/
 | ||
| .. _Ganglia: http://ganglia.sourceforge.net/
 | ||
| 
 | ||
| The sample rate is a real number between 0 and 1 which defines the
 | ||
| probability of sending a sample for any given event or timing measurement.
 | ||
| This sample rate is sent with each sample to StatsD and used to
 | ||
| multiply the value.  For example, with a sample rate of 0.5, StatsD will
 | ||
| multiply that counter's value by 2 when flushing the metric to an upstream
 | ||
| monitoring system (Graphite_, Ganglia_, etc.).
 | ||
| 
 | ||
| Some relatively high-frequency metrics have a default sample rate less than
 | ||
| one.  If you want to override the default sample rate for all metrics whose
 | ||
| default sample rate is not specified in the Swift source, you may set
 | ||
| `log_statsd_default_sample_rate` to a value less than one.  This is NOT
 | ||
| recommended (see next paragraph).  A better way to reduce StatsD load is to
 | ||
| adjust `log_statsd_sample_rate_factor` to a value less than one.  The
 | ||
| `log_statsd_sample_rate_factor` is multiplied to any sample rate (either the
 | ||
| global default or one specified by the actual metric logging call in the Swift
 | ||
| source) prior to handling.  In other words, this one tunable can lower the
 | ||
| frequency of all StatsD logging by a proportional amount.
 | ||
| 
 | ||
| To get the best data, start with the default `log_statsd_default_sample_rate`
 | ||
| and `log_statsd_sample_rate_factor` values of 1 and only lower
 | ||
| `log_statsd_sample_rate_factor` if needed.  The
 | ||
| `log_statsd_default_sample_rate` should not be used and remains for backward
 | ||
| compatibility only.
 | ||
| 
 | ||
| The metric prefix will be prepended to every metric sent to the StatsD server
 | ||
| For example, with::
 | ||
| 
 | ||
|     log_statsd_metric_prefix = proxy01
 | ||
| 
 | ||
| the metric `proxy-server.errors` would be sent to StatsD as
 | ||
| `proxy01.proxy-server.errors`.  This is useful for differentiating different
 | ||
| servers when sending statistics to a central StatsD server.  If you run a local
 | ||
| StatsD server per node, you could configure a per-node metrics prefix there and
 | ||
| leave `log_statsd_metric_prefix` blank.
 | ||
| 
 | ||
| Note that metrics reported to StatsD are counters or timing data (which are
 | ||
| sent in units of milliseconds).  StatsD usually expands timing data out to min,
 | ||
| max, avg, count, and 90th percentile per timing metric, but the details of
 | ||
| this behavior will depend on the configuration of your StatsD server.  Some
 | ||
| important "gauge" metrics may still need to be collected using another method.
 | ||
| For example, the `object-server.async_pendings` StatsD metric counts the generation
 | ||
| of async_pendings in real-time, but will not tell you the current number of
 | ||
| async_pending container updates on disk at any point in time.
 | ||
| 
 | ||
| Note also that the set of metrics collected, their names, and their semantics
 | ||
| are not locked down and will change over time.
 | ||
| 
 | ||
| Metrics for `account-auditor`:
 | ||
| 
 | ||
| ==========================  =========================================================
 | ||
| Metric Name                 Description
 | ||
| --------------------------  ---------------------------------------------------------
 | ||
| `account-auditor.errors`    Count of audit runs (across all account databases) which
 | ||
|                             caught an Exception.
 | ||
| `account-auditor.passes`    Count of individual account databases which passed audit.
 | ||
| `account-auditor.failures`  Count of individual account databases which failed audit.
 | ||
| `account-auditor.timing`    Timing data for individual account database audits.
 | ||
| ==========================  =========================================================
 | ||
| 
 | ||
| Metrics for `account-reaper`:
 | ||
| 
 | ||
| ==============================================  ====================================================
 | ||
| Metric Name                                     Description
 | ||
| ----------------------------------------------  ----------------------------------------------------
 | ||
| `account-reaper.errors`                         Count of devices failing the mount check.
 | ||
| `account-reaper.timing`                         Timing data for each reap_account() call.
 | ||
| `account-reaper.return_codes.X`                 Count of HTTP return codes from various operations
 | ||
|                                                 (e.g. object listing, container deletion, etc.). The
 | ||
|                                                 value for X is the first digit of the return code
 | ||
|                                                 (2 for 201, 4 for 404, etc.).
 | ||
| `account-reaper.containers_failures`            Count of failures to delete a container.
 | ||
| `account-reaper.containers_deleted`             Count of containers successfully deleted.
 | ||
| `account-reaper.containers_remaining`           Count of containers which failed to delete with
 | ||
|                                                 zero successes.
 | ||
| `account-reaper.containers_possibly_remaining`  Count of containers which failed to delete with
 | ||
|                                                 at least one success.
 | ||
| `account-reaper.objects_failures`               Count of failures to delete an object.
 | ||
| `account-reaper.objects_deleted`                Count of objects successfully deleted.
 | ||
| `account-reaper.objects_remaining`              Count of objects which failed to delete with zero
 | ||
|                                                 successes.
 | ||
| `account-reaper.objects_possibly_remaining`     Count of objects which failed to delete with at
 | ||
|                                                 least one success.
 | ||
| ==============================================  ====================================================
 | ||
| 
 | ||
| Metrics for `account-server` ("Not Found" is not considered an error and requests
 | ||
| which increment `errors` are not included in the timing data):
 | ||
| 
 | ||
| ========================================  =======================================================
 | ||
| Metric Name                               Description
 | ||
| ----------------------------------------  -------------------------------------------------------
 | ||
| `account-server.DELETE.errors.timing`     Timing data for each DELETE request resulting in an
 | ||
|                                           error: bad request, not mounted, missing timestamp.
 | ||
| `account-server.DELETE.timing`            Timing data for each DELETE request not resulting in
 | ||
|                                           an error.
 | ||
| `account-server.PUT.errors.timing`        Timing data for each PUT request resulting in an error:
 | ||
|                                           bad request, not mounted, conflict, recently-deleted.
 | ||
| `account-server.PUT.timing`               Timing data for each PUT request not resulting in an
 | ||
|                                           error.
 | ||
| `account-server.HEAD.errors.timing`       Timing data for each HEAD request resulting in an
 | ||
|                                           error: bad request, not mounted.
 | ||
| `account-server.HEAD.timing`              Timing data for each HEAD request not resulting in
 | ||
|                                           an error.
 | ||
| `account-server.GET.errors.timing`        Timing data for each GET request resulting in an
 | ||
|                                           error: bad request, not mounted, bad delimiter,
 | ||
|                                           account listing limit too high, bad accept header.
 | ||
| `account-server.GET.timing`               Timing data for each GET request not resulting in
 | ||
|                                           an error.
 | ||
| `account-server.REPLICATE.errors.timing`  Timing data for each REPLICATE request resulting in an
 | ||
|                                           error: bad request, not mounted.
 | ||
| `account-server.REPLICATE.timing`         Timing data for each REPLICATE request not resulting
 | ||
|                                           in an error.
 | ||
| `account-server.POST.errors.timing`       Timing data for each POST request resulting in an
 | ||
|                                           error: bad request, bad or missing timestamp, not
 | ||
|                                           mounted.
 | ||
| `account-server.POST.timing`              Timing data for each POST request not resulting in
 | ||
|                                           an error.
 | ||
| ========================================  =======================================================
 | ||
| 
 | ||
| Metrics for `account-replicator`:
 | ||
| 
 | ||
| =====================================  ====================================================
 | ||
| Metric Name                            Description
 | ||
| -------------------------------------  ----------------------------------------------------
 | ||
| `account-replicator.diffs`             Count of syncs handled by sending differing rows.
 | ||
| `account-replicator.diff_caps`         Count of "diffs" operations which failed because
 | ||
|                                        "max_diffs" was hit.
 | ||
| `account-replicator.no_changes`        Count of accounts found to be in sync.
 | ||
| `account-replicator.hashmatches`       Count of accounts found to be in sync via hash
 | ||
|                                        comparison (`broker.merge_syncs` was called).
 | ||
| `account-replicator.rsyncs`            Count of completely missing accounts which were sent
 | ||
|                                        via rsync.
 | ||
| `account-replicator.remote_merges`     Count of syncs handled by sending entire database
 | ||
|                                        via rsync.
 | ||
| `account-replicator.attempts`          Count of database replication attempts.
 | ||
| `account-replicator.failures`          Count of database replication attempts which failed
 | ||
|                                        due to corruption (quarantined) or inability to read
 | ||
|                                        as well as attempts to individual nodes which
 | ||
|                                        failed.
 | ||
| `account-replicator.removes.<device>`  Count of databases on <device> deleted because the
 | ||
|                                        delete_timestamp was greater than the put_timestamp
 | ||
|                                        and the database had no rows or because it was
 | ||
|                                        successfully sync'ed to other locations and doesn't
 | ||
|                                        belong here anymore.
 | ||
| `account-replicator.successes`         Count of replication attempts to an individual node
 | ||
|                                        which were successful.
 | ||
| `account-replicator.timing`            Timing data for each database replication attempt
 | ||
|                                        not resulting in a failure.
 | ||
| =====================================  ====================================================
 | ||
| 
 | ||
| Metrics for `container-auditor`:
 | ||
| 
 | ||
| ============================  ====================================================
 | ||
| Metric Name                   Description
 | ||
| ----------------------------  ----------------------------------------------------
 | ||
| `container-auditor.errors`    Incremented when an Exception is caught in an audit
 | ||
|                               pass (only once per pass, max).
 | ||
| `container-auditor.passes`    Count of individual containers passing an audit.
 | ||
| `container-auditor.failures`  Count of individual containers failing an audit.
 | ||
| `container-auditor.timing`    Timing data for each container audit.
 | ||
| ============================  ====================================================
 | ||
| 
 | ||
| Metrics for `container-replicator`:
 | ||
| 
 | ||
| =======================================  ====================================================
 | ||
| Metric Name                              Description
 | ||
| ---------------------------------------  ----------------------------------------------------
 | ||
| `container-replicator.diffs`             Count of syncs handled by sending differing rows.
 | ||
| `container-replicator.diff_caps`         Count of "diffs" operations which failed because
 | ||
|                                          "max_diffs" was hit.
 | ||
| `container-replicator.no_changes`        Count of containers found to be in sync.
 | ||
| `container-replicator.hashmatches`       Count of containers found to be in sync via hash
 | ||
|                                          comparison (`broker.merge_syncs` was called).
 | ||
| `container-replicator.rsyncs`            Count of completely missing containers where were sent
 | ||
|                                          via rsync.
 | ||
| `container-replicator.remote_merges`     Count of syncs handled by sending entire database
 | ||
|                                          via rsync.
 | ||
| `container-replicator.attempts`          Count of database replication attempts.
 | ||
| `container-replicator.failures`          Count of database replication attempts which failed
 | ||
|                                          due to corruption (quarantined) or inability to read
 | ||
|                                          as well as attempts to individual nodes which
 | ||
|                                          failed.
 | ||
| `container-replicator.removes.<device>`  Count of databases deleted on <device> because the
 | ||
|                                          delete_timestamp was greater than the put_timestamp
 | ||
|                                          and the database had no rows or because it was
 | ||
|                                          successfully sync'ed to other locations and doesn't
 | ||
|                                          belong here anymore.
 | ||
| `container-replicator.successes`         Count of replication attempts to an individual node
 | ||
|                                          which were successful.
 | ||
| `container-replicator.timing`            Timing data for each database replication attempt
 | ||
|                                          not resulting in a failure.
 | ||
| =======================================  ====================================================
 | ||
| 
 | ||
| Metrics for `container-server` ("Not Found" is not considered an error and requests
 | ||
| which increment `errors` are not included in the timing data):
 | ||
| 
 | ||
| ==========================================  ====================================================
 | ||
| Metric Name                                 Description
 | ||
| ------------------------------------------  ----------------------------------------------------
 | ||
| `container-server.DELETE.errors.timing`     Timing data for DELETE request errors: bad request,
 | ||
|                                             not mounted, missing timestamp, conflict.
 | ||
| `container-server.DELETE.timing`            Timing data for each DELETE request not resulting in
 | ||
|                                             an error.
 | ||
| `container-server.PUT.errors.timing`        Timing data for PUT request errors: bad request,
 | ||
|                                             missing timestamp, not mounted, conflict.
 | ||
| `container-server.PUT.timing`               Timing data for each PUT request not resulting in an
 | ||
|                                             error.
 | ||
| `container-server.HEAD.errors.timing`       Timing data for HEAD request errors: bad request,
 | ||
|                                             not mounted.
 | ||
| `container-server.HEAD.timing`              Timing data for each HEAD request not resulting in
 | ||
|                                             an error.
 | ||
| `container-server.GET.errors.timing`        Timing data for GET request errors: bad request,
 | ||
|                                             not mounted, parameters not utf8, bad accept header.
 | ||
| `container-server.GET.timing`               Timing data for each GET request not resulting in
 | ||
|                                             an error.
 | ||
| `container-server.REPLICATE.errors.timing`  Timing data for REPLICATE request errors: bad
 | ||
|                                             request, not mounted.
 | ||
| `container-server.REPLICATE.timing`         Timing data for each REPLICATE request not resulting
 | ||
|                                             in an error.
 | ||
| `container-server.POST.errors.timing`       Timing data for POST request errors: bad request,
 | ||
|                                             bad x-container-sync-to, not mounted.
 | ||
| `container-server.POST.timing`              Timing data for each POST request not resulting in
 | ||
|                                             an error.
 | ||
| ==========================================  ====================================================
 | ||
| 
 | ||
| Metrics for `container-sync`:
 | ||
| 
 | ||
| ===============================  ====================================================
 | ||
| Metric Name                      Description
 | ||
| -------------------------------  ----------------------------------------------------
 | ||
| `container-sync.skips`           Count of containers skipped because they don't have
 | ||
|                                  sync'ing enabled.
 | ||
| `container-sync.failures`        Count of failures sync'ing of individual containers.
 | ||
| `container-sync.syncs`           Count of individual containers sync'ed successfully.
 | ||
| `container-sync.deletes`         Count of container database rows sync'ed by
 | ||
|                                  deletion.
 | ||
| `container-sync.deletes.timing`  Timing data for each container database row
 | ||
|                                  synchronization via deletion.
 | ||
| `container-sync.puts`            Count of container database rows sync'ed by Putting.
 | ||
| `container-sync.puts.timing`     Timing data for each container database row
 | ||
|                                  synchronization via Putting.
 | ||
| ===============================  ====================================================
 | ||
| 
 | ||
| Metrics for `container-updater`:
 | ||
| 
 | ||
| ==============================  ====================================================
 | ||
| Metric Name                     Description
 | ||
| ------------------------------  ----------------------------------------------------
 | ||
| `container-updater.successes`   Count of containers which successfully updated their
 | ||
|                                 account.
 | ||
| `container-updater.failures`    Count of containers which failed to update their
 | ||
|                                 account.
 | ||
| `container-updater.no_changes`  Count of containers which didn't need to update
 | ||
|                                 their account.
 | ||
| `container-updater.timing`      Timing data for processing a container; only
 | ||
|                                 includes timing for containers which needed to
 | ||
|                                 update their accounts (i.e. "successes" and
 | ||
|                                 "failures" but not "no_changes").
 | ||
| ==============================  ====================================================
 | ||
| 
 | ||
| Metrics for `object-auditor`:
 | ||
| 
 | ||
| ============================  ====================================================
 | ||
| Metric Name                   Description
 | ||
| ----------------------------  ----------------------------------------------------
 | ||
| `object-auditor.quarantines`  Count of objects failing audit and quarantined.
 | ||
| `object-auditor.errors`       Count of errors encountered while auditing objects.
 | ||
| `object-auditor.timing`       Timing data for each object audit (does not include
 | ||
|                               any rate-limiting sleep time for
 | ||
|                               max_files_per_second, but does include rate-limiting
 | ||
|                               sleep time for max_bytes_per_second).
 | ||
| ============================  ====================================================
 | ||
| 
 | ||
| Metrics for `object-expirer`:
 | ||
| 
 | ||
| ========================  ====================================================
 | ||
| Metric Name               Description
 | ||
| ------------------------  ----------------------------------------------------
 | ||
| `object-expirer.objects`  Count of objects expired.
 | ||
| `object-expirer.errors`   Count of errors encountered while attempting to
 | ||
|                           expire an object.
 | ||
| `object-expirer.timing`   Timing data for each object expiration attempt,
 | ||
|                           including ones resulting in an error.
 | ||
| ========================  ====================================================
 | ||
| 
 | ||
| Metrics for `object-reconstructor`:
 | ||
| 
 | ||
| ======================================================  ======================================================
 | ||
| Metric Name                                             Description
 | ||
| ------------------------------------------------------  ------------------------------------------------------
 | ||
| `object-reconstructor.partition.delete.count.<device>`  A count of partitions on <device> which were
 | ||
|                                                         reconstructed and synced to another node because they
 | ||
|                                                         didn't belong on this node. This metric is tracked
 | ||
|                                                         per-device to allow for "quiescence detection" for
 | ||
|                                                         object reconstruction activity on each device.
 | ||
| `object-reconstructor.partition.delete.timing`          Timing data for partitions reconstructed and synced to
 | ||
|                                                         another node because they didn't belong on this node.
 | ||
|                                                         This metric is not tracked per device.
 | ||
| `object-reconstructor.partition.update.count.<device>`  A count of partitions on <device> which were
 | ||
|                                                         reconstructed and synced to another node, but also
 | ||
|                                                         belong on this node. As with delete.count, this metric
 | ||
|                                                         is tracked per-device.
 | ||
| `object-reconstructor.partition.update.timing`          Timing data for partitions reconstructed which also
 | ||
|                                                         belong on this node. This metric is not tracked
 | ||
|                                                         per-device.
 | ||
| `object-reconstructor.suffix.hashes`                    Count of suffix directories whose hash (of filenames)
 | ||
|                                                         was recalculated.
 | ||
| `object-reconstructor.suffix.syncs`                     Count of suffix directories reconstructed with ssync.
 | ||
| ======================================================  ======================================================
 | ||
| 
 | ||
| Metrics for `object-replicator`:
 | ||
| 
 | ||
| ===================================================  ====================================================
 | ||
| Metric Name                                          Description
 | ||
| ---------------------------------------------------  ----------------------------------------------------
 | ||
| `object-replicator.partition.delete.count.<device>`  A count of partitions on <device> which were
 | ||
|                                                      replicated to another node because they didn't
 | ||
|                                                      belong on this node.  This metric is tracked
 | ||
|                                                      per-device to allow for "quiescence detection" for
 | ||
|                                                      object replication activity on each device.
 | ||
| `object-replicator.partition.delete.timing`          Timing data for partitions replicated to another
 | ||
|                                                      node because they didn't belong on this node.  This
 | ||
|                                                      metric is not tracked per device.
 | ||
| `object-replicator.partition.update.count.<device>`  A count of partitions on <device> which were
 | ||
|                                                      replicated to another node, but also belong on this
 | ||
|                                                      node.  As with delete.count, this metric is tracked
 | ||
|                                                      per-device.
 | ||
| `object-replicator.partition.update.timing`          Timing data for partitions replicated which also
 | ||
|                                                      belong on this node.  This metric is not tracked
 | ||
|                                                      per-device.
 | ||
| `object-replicator.suffix.hashes`                    Count of suffix directories whose hash (of filenames)
 | ||
|                                                      was recalculated.
 | ||
| `object-replicator.suffix.syncs`                     Count of suffix directories replicated with rsync.
 | ||
| ===================================================  ====================================================
 | ||
| 
 | ||
| Metrics for `object-server`:
 | ||
| 
 | ||
| =======================================  ====================================================
 | ||
| Metric Name                              Description
 | ||
| ---------------------------------------  ----------------------------------------------------
 | ||
| `object-server.quarantines`              Count of objects (files) found bad and moved to
 | ||
|                                          quarantine.
 | ||
| `object-server.async_pendings`           Count of container updates saved as async_pendings
 | ||
|                                          (may result from PUT or DELETE requests).
 | ||
| `object-server.POST.errors.timing`       Timing data for POST request errors: bad request,
 | ||
|                                          missing timestamp, delete-at in past, not mounted.
 | ||
| `object-server.POST.timing`              Timing data for each POST request not resulting in
 | ||
|                                          an error.
 | ||
| `object-server.PUT.errors.timing`        Timing data for PUT request errors: bad request,
 | ||
|                                          not mounted, missing timestamp, object creation
 | ||
|                                          constraint violation, delete-at in past.
 | ||
| `object-server.PUT.timeouts`             Count of object PUTs which exceeded max_upload_time.
 | ||
| `object-server.PUT.timing`               Timing data for each PUT request not resulting in an
 | ||
|                                          error.
 | ||
| `object-server.PUT.<device>.timing`      Timing data per kB transferred (ms/kB) for each
 | ||
|                                          non-zero-byte PUT request on each device.
 | ||
|                                          Monitoring problematic devices, higher is bad.
 | ||
| `object-server.GET.errors.timing`        Timing data for GET request errors: bad request,
 | ||
|                                          not mounted, header timestamps before the epoch,
 | ||
|                                          precondition failed.
 | ||
|                                          File errors resulting in a quarantine are not
 | ||
|                                          counted here.
 | ||
| `object-server.GET.timing`               Timing data for each GET request not resulting in an
 | ||
|                                          error.  Includes requests which couldn't find the
 | ||
|                                          object (including disk errors resulting in file
 | ||
|                                          quarantine).
 | ||
| `object-server.HEAD.errors.timing`       Timing data for HEAD request errors: bad request,
 | ||
|                                          not mounted.
 | ||
| `object-server.HEAD.timing`              Timing data for each HEAD request not resulting in
 | ||
|                                          an error.  Includes requests which couldn't find the
 | ||
|                                          object (including disk errors resulting in file
 | ||
|                                          quarantine).
 | ||
| `object-server.DELETE.errors.timing`     Timing data for DELETE request errors: bad request,
 | ||
|                                          missing timestamp, not mounted, precondition
 | ||
|                                          failed.  Includes requests which couldn't find or
 | ||
|                                          match the object.
 | ||
| `object-server.DELETE.timing`            Timing data for each DELETE request not resulting
 | ||
|                                          in an error.
 | ||
| `object-server.REPLICATE.errors.timing`  Timing data for REPLICATE request errors: bad
 | ||
|                                          request, not mounted.
 | ||
| `object-server.REPLICATE.timing`         Timing data for each REPLICATE request not resulting
 | ||
|                                          in an error.
 | ||
| =======================================  ====================================================
 | ||
| 
 | ||
| Metrics for `object-updater`:
 | ||
| 
 | ||
| ============================  ====================================================
 | ||
| Metric Name                   Description
 | ||
| ----------------------------  ----------------------------------------------------
 | ||
| `object-updater.errors`       Count of drives not mounted or async_pending files
 | ||
|                               with an unexpected name.
 | ||
| `object-updater.timing`       Timing data for object sweeps to flush async_pending
 | ||
|                               container updates.  Does not include object sweeps
 | ||
|                               which did not find an existing async_pending storage
 | ||
|                               directory.
 | ||
| `object-updater.quarantines`  Count of async_pending container updates which were
 | ||
|                               corrupted and moved to quarantine.
 | ||
| `object-updater.successes`    Count of successful container updates.
 | ||
| `object-updater.failures`     Count of failed container updates.
 | ||
| `object-updater.unlinks`      Count of async_pending files unlinked. An
 | ||
|                               async_pending file is unlinked either when it is
 | ||
|                               successfully processed or when the replicator sees
 | ||
|                               that there is a newer async_pending file for the
 | ||
|                               same object.
 | ||
| ============================  ====================================================
 | ||
| 
 | ||
| Metrics for `proxy-server` (in the table, `<type>` is the proxy-server
 | ||
| controller responsible for the request and will be one of "account",
 | ||
| "container", or "object"):
 | ||
| 
 | ||
| ========================================  ====================================================
 | ||
| Metric Name                               Description
 | ||
| ----------------------------------------  ----------------------------------------------------
 | ||
| `proxy-server.errors`                     Count of errors encountered while serving requests
 | ||
|                                           before the controller type is determined.  Includes
 | ||
|                                           invalid Content-Length, errors finding the internal
 | ||
|                                           controller to handle the request, invalid utf8, and
 | ||
|                                           bad URLs.
 | ||
| `proxy-server.<type>.handoff_count`       Count of node hand-offs; only tracked if log_handoffs
 | ||
|                                           is set in the proxy-server config.
 | ||
| `proxy-server.<type>.handoff_all_count`   Count of times *only* hand-off locations were
 | ||
|                                           utilized; only tracked if log_handoffs is set in the
 | ||
|                                           proxy-server config.
 | ||
| `proxy-server.<type>.client_timeouts`     Count of client timeouts (client did not read within
 | ||
|                                           `client_timeout` seconds during a GET or did not
 | ||
|                                           supply data within `client_timeout` seconds during
 | ||
|                                           a PUT).
 | ||
| `proxy-server.<type>.client_disconnects`  Count of detected client disconnects during PUT
 | ||
|                                           operations (does NOT include caught Exceptions in
 | ||
|                                           the proxy-server which caused a client disconnect).
 | ||
| ========================================  ====================================================
 | ||
| 
 | ||
| Metrics for `proxy-logging` middleware (in the table, `<type>` is either the
 | ||
| proxy-server controller responsible for the request: "account", "container",
 | ||
| "object", or the string "SOS" if the request came from the `Swift Origin Server`_
 | ||
| middleware.  The `<verb>` portion will be one of "GET", "HEAD", "POST", "PUT",
 | ||
| "DELETE", "COPY", "OPTIONS", or "BAD_METHOD".  The list of valid HTTP methods
 | ||
| is configurable via the `log_statsd_valid_http_methods` config variable and
 | ||
| the default setting yields the above behavior):
 | ||
| 
 | ||
| .. _Swift Origin Server: https://github.com/dpgoetz/sos
 | ||
| 
 | ||
| ====================================================  ============================================
 | ||
| Metric Name                                           Description
 | ||
| ----------------------------------------------------  --------------------------------------------
 | ||
| `proxy-server.<type>.<verb>.<status>.timing`          Timing data for requests, start to finish.
 | ||
|                                                       The <status> portion is the numeric HTTP
 | ||
|                                                       status code for the request (e.g.  "200" or
 | ||
|                                                       "404").
 | ||
| `proxy-server.<type>.GET.<status>.first-byte.timing`  Timing data up to completion of sending the
 | ||
|                                                       response headers (only for GET requests).
 | ||
|                                                       <status> and <type> are as for the main
 | ||
|                                                       timing metric.
 | ||
| `proxy-server.<type>.<verb>.<status>.xfer`            This counter metric is the sum of bytes
 | ||
|                                                       transferred in (from clients) and out (to
 | ||
|                                                       clients) for requests.  The <type>, <verb>,
 | ||
|                                                       and <status> portions of the metric are just
 | ||
|                                                       like the main timing metric.
 | ||
| ====================================================  ============================================
 | ||
| 
 | ||
| The `proxy-logging` middleware also groups these metrics by policy.  The
 | ||
| `<policy-index>` portion represents a policy index):
 | ||
| 
 | ||
| ==========================================================================  =====================================
 | ||
| Metric Name                                                                 Description
 | ||
| --------------------------------------------------------------------------  -------------------------------------
 | ||
| `proxy-server.object.policy.<policy-index>.<verb>.<status>.timing`          Timing data for requests, aggregated
 | ||
|                                                                             by policy index.
 | ||
| `proxy-server.object.policy.<policy-index>.GET.<status>.first-byte.timing`  Timing data up to completion of
 | ||
|                                                                             sending the response headers,
 | ||
|                                                                             aggregated by policy index.
 | ||
| `proxy-server.object.policy.<policy-index>.<verb>.<status>.xfer`            Sum of bytes transferred in and out,
 | ||
|                                                                             aggregated by policy index.
 | ||
| ==========================================================================  =====================================
 | ||
| 
 | ||
| Metrics for `tempauth` middleware (in the table, `<reseller_prefix>` represents
 | ||
| the actual configured reseller_prefix or "`NONE`" if the reseller_prefix is the
 | ||
| empty string):
 | ||
| 
 | ||
| =========================================  ====================================================
 | ||
| Metric Name                                Description
 | ||
| -----------------------------------------  ----------------------------------------------------
 | ||
| `tempauth.<reseller_prefix>.unauthorized`  Count of regular requests which were denied with
 | ||
|                                            HTTPUnauthorized.
 | ||
| `tempauth.<reseller_prefix>.forbidden`     Count of regular requests which were denied with
 | ||
|                                            HTTPForbidden.
 | ||
| `tempauth.<reseller_prefix>.token_denied`  Count of token requests which were denied.
 | ||
| `tempauth.<reseller_prefix>.errors`        Count of errors.
 | ||
| =========================================  ====================================================
 | ||
| 
 | ||
| 
 | ||
| ------------------------
 | ||
| Debugging Tips and Tools
 | ||
| ------------------------
 | ||
| 
 | ||
| When a request is made to Swift, it is given a unique transaction id.  This
 | ||
| id should be in every log line that has to do with that request.  This can
 | ||
| be useful when looking at all the services that are hit by a single request.
 | ||
| 
 | ||
| If you need to know where a specific account, container or object is in the
 | ||
| cluster, `swift-get-nodes` will show the location where each replica should be.
 | ||
| 
 | ||
| If you are looking at an object on the server and need more info,
 | ||
| `swift-object-info` will display the account, container, replica locations
 | ||
| and metadata of the object.
 | ||
| 
 | ||
| If you are looking at a container on the server and need more info,
 | ||
| `swift-container-info` will display all the information like the account,
 | ||
| container, replica locations and metadata of the container.
 | ||
| 
 | ||
| If you are looking at an account on the server and need more info,
 | ||
| `swift-account-info` will display the account, replica locations
 | ||
| and metadata of the account.
 | ||
| 
 | ||
| If you want to audit the data for an account, `swift-account-audit` can be
 | ||
| used to crawl the account, checking that all containers and objects can be
 | ||
| found.
 | ||
| 
 | ||
| -----------------
 | ||
| Managing Services
 | ||
| -----------------
 | ||
| 
 | ||
| Swift services are generally managed with ``swift-init``. the general usage is
 | ||
| ``swift-init <service> <command>``, where service is the Swift service to
 | ||
| manage (for example object, container, account, proxy) and command is one of:
 | ||
| 
 | ||
| ==========  ===============================================
 | ||
| Command     Description
 | ||
| ----------  -----------------------------------------------
 | ||
| start       Start the service
 | ||
| stop        Stop the service
 | ||
| restart     Restart the service
 | ||
| shutdown    Attempt to gracefully shutdown the service
 | ||
| reload      Attempt to gracefully restart the service
 | ||
| ==========  ===============================================
 | ||
| 
 | ||
| A graceful shutdown or reload will finish any current requests before
 | ||
| completely stopping the old service.  There is also a special case of
 | ||
| ``swift-init all <command>``, which will run the command for all swift
 | ||
| services.
 | ||
| 
 | ||
| In cases where there are multiple configs for a service, a specific config
 | ||
| can be managed with ``swift-init <service>.<config> <command>``.
 | ||
| For example, when a separate replication network is used, there might be
 | ||
| ``/etc/swift/object-server/public.conf`` for the object server and
 | ||
| ``/etc/swift/object-server/replication.conf`` for the replication services.
 | ||
| In this case, the replication services could be restarted with
 | ||
| ``swift-init object-server.replication restart``.
 | ||
| 
 | ||
| --------------
 | ||
| Object Auditor
 | ||
| --------------
 | ||
| 
 | ||
| On system failures, the XFS file system can sometimes truncate files it's
 | ||
| trying to write and produce zero-byte files. The object-auditor will catch
 | ||
| these problems but in the case of a system crash it would be advisable to run
 | ||
| an extra, less rate limited sweep to check for these specific files. You can
 | ||
| run this command as follows::
 | ||
| 
 | ||
|    swift-object-auditor /path/to/object-server/config/file.conf once -z 1000
 | ||
| 
 | ||
| ``-z`` means to only check for zero-byte files at 1000 files per second.
 | ||
| 
 | ||
| At times it is useful to be able to run the object auditor on a specific
 | ||
| device or set of devices.  You can run the object-auditor as follows::
 | ||
| 
 | ||
|    swift-object-auditor /path/to/object-server/config/file.conf once --devices=sda,sdb
 | ||
| 
 | ||
| This will run the object auditor on only the sda and sdb devices. This param
 | ||
| accepts a comma separated list of values.
 | ||
| 
 | ||
| -----------------
 | ||
| Object Replicator
 | ||
| -----------------
 | ||
| 
 | ||
| At times it is useful to be able to run the object replicator on a specific
 | ||
| device or partition.  You can run the object-replicator as follows::
 | ||
| 
 | ||
|    swift-object-replicator /path/to/object-server/config/file.conf once --devices=sda,sdb
 | ||
| 
 | ||
| This will run the object replicator on only the sda and sdb devices.  You can
 | ||
| likewise run that command with ``--partitions``.  Both params accept a comma
 | ||
| separated list of values. If both are specified they will be ANDed together.
 | ||
| These can only be run in "once" mode.
 | ||
| 
 | ||
| -------------
 | ||
| Swift Orphans
 | ||
| -------------
 | ||
| 
 | ||
| Swift Orphans are processes left over after a reload of a Swift server.
 | ||
| 
 | ||
| For example, when upgrading a proxy server you would probably finish
 | ||
| with a ``swift-init proxy-server reload`` or ``/etc/init.d/swift-proxy
 | ||
| reload``. This kills the parent proxy server process and leaves the
 | ||
| child processes running to finish processing whatever requests they
 | ||
| might be handling at the time. It then starts up a new parent proxy
 | ||
| server process and its children to handle new incoming requests. This
 | ||
| allows zero-downtime upgrades with no impact to existing requests.
 | ||
| 
 | ||
| The orphaned child processes may take a while to exit, depending on
 | ||
| the length of the requests they were handling. However, sometimes an
 | ||
| old process can be hung up due to some bug or hardware issue. In these
 | ||
| cases, these orphaned processes will hang around
 | ||
| forever. ``swift-orphans`` can be used to find and kill these orphans.
 | ||
| 
 | ||
| ``swift-orphans`` with no arguments will just list the orphans it finds
 | ||
| that were started more than 24 hours ago. You shouldn't really check
 | ||
| for orphans until 24 hours after you perform a reload, as some
 | ||
| requests can take a long time to process. ``swift-orphans -k TERM`` will
 | ||
| send the SIG_TERM signal to the orphans processes, or you can ``kill
 | ||
| -TERM`` the pids yourself if you prefer.
 | ||
| 
 | ||
| You can run ``swift-orphans --help`` for more options.
 | ||
| 
 | ||
| 
 | ||
| ------------
 | ||
| Swift Oldies
 | ||
| ------------
 | ||
| 
 | ||
| Swift Oldies are processes that have just been around for a long
 | ||
| time. There's nothing necessarily wrong with this, but it might
 | ||
| indicate a hung process if you regularly upgrade and reload/restart
 | ||
| services. You might have so many servers that you don't notice when a
 | ||
| reload/restart fails; ``swift-oldies`` can help with this.
 | ||
| 
 | ||
| For example, if you upgraded and reloaded/restarted everything 2 days
 | ||
| ago, and you've already cleaned up any orphans with ``swift-orphans``,
 | ||
| you can run ``swift-oldies -a 48`` to find any Swift processes still
 | ||
| around that were started more than 2 days ago and then investigate
 | ||
| them accordingly.
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
| -------------------
 | ||
| Custom Log Handlers
 | ||
| -------------------
 | ||
| 
 | ||
| Swift supports setting up custom log handlers for services by specifying a
 | ||
| comma-separated list of functions to invoke when logging is setup. It does so
 | ||
| via the ``log_custom_handlers`` configuration option. Logger hooks invoked are
 | ||
| passed the same arguments as Swift's get_logger function (as well as the
 | ||
| getLogger and LogAdapter object):
 | ||
| 
 | ||
| ==============  ===============================================
 | ||
| Name            Description
 | ||
| --------------  -----------------------------------------------
 | ||
| conf            Configuration dict to read settings from
 | ||
| name            Name of the logger received
 | ||
| log_to_console  (optional) Write log messages to console on stderr
 | ||
| log_route       Route for the logging received
 | ||
| fmt             Override log format received
 | ||
| logger          The logging.getLogger object
 | ||
| adapted_logger  The LogAdapter object
 | ||
| ==============  ===============================================
 | ||
| 
 | ||
| A basic example that sets up a custom logger might look like the
 | ||
| following:
 | ||
| 
 | ||
| 
 | ||
| .. code-block:: python
 | ||
| 
 | ||
|     def my_logger(conf, name, log_to_console, log_route, fmt, logger,
 | ||
|                   adapted_logger):
 | ||
|         my_conf_opt = conf.get('some_custom_setting')
 | ||
|         my_handler = third_party_logstore_handler(my_conf_opt)
 | ||
|         logger.addHandler(my_handler)
 | ||
| 
 | ||
| See :ref:`custom-logger-hooks-label` for sample use cases.
 | ||
| 
 | ||
| ------------------------
 | ||
| Securing OpenStack Swift
 | ||
| ------------------------
 | ||
| 
 | ||
| Please refer to the security guide at https://docs.openstack.org/security-guide
 | ||
| and in particular the `Object Storage
 | ||
| <https://docs.openstack.org/security-guide/object-storage.html>`__ section.
 |