230 Commits

Author SHA1 Message Date
Kun Huang
5c8785aaee Add max_header_size to swift.conf-sample and relative UT
1. Add explanation of MAX_HEADER_SIZE into swift.conf-sample as same as
other settings in swift.conf. Especially point out the default size of
header line in eventlet is 8192 which is the main reason why we set 8192
for MAX_HEADER_SIZE in swift.

2. Add some unit tests to check valid settings in swift.conf. Test cases
in test_constraints use /etc/swift/swift.conf if exists, and if any
wrong settings are in it (MAX_META_VALE > MAX_META_OVERALL_SIZE), swift's
unit test must fail. These new unit tests is used in this case.

Change-Id: I7bb21951d46050163c1b7bceac8d49302b9209f7
2013-06-19 23:45:38 +08:00
Jenkins
5bfd2d798d Merge "Add parallelism to object expirer daemon." 2013-06-11 22:48:24 +00:00
Jenkins
b63b5d590a Merge "Use threadpools in the object server for performance." 2013-06-11 22:47:07 +00:00
Samuel Merritt
f559c50acb Local read affinity for GET/HEAD requests.
Now you can configure the proxy server to read from "local" primary
nodes first, where "local" is governed by the newly-introduced
"read_affinity" setting in the proxy config. This is desirable when
the network links between regions/zones are of varying capacities; in
such a case, it's a good idea to prefer fetching data from closer
backends.

The new setting looks like rN[zM]=P, where N is the region number, M
is the optional zone number, and P is the priority. Multiple values
can be specified by separating them with commas. The priority for
nodes that don't match anything is a very large number, so they'll
sort last.

This only affects the ordering of the primary nodes; it doesn't affect
handoffs at all. Further, while the primary nodes are reordered for
all requests, it only matters for GET/HEAD requests since handling the
other verbs ends up making concurrent requests to *all* the primary
nodes, so ordering is irrelevant.

Note that the default proxy config does not have this setting turned
on, so the default configuration's behavior is unaffected.

blueprint multi-region

Change-Id: Iea4cd367ed37fe5ee69b63234541d358d29963a4
2013-06-10 16:51:47 -07:00
Jenkins
4077252f23 Merge "Make sample configs more readable." 2013-06-10 14:24:49 +00:00
Greg Lange
209c5ec418 Add parallelism to object expirer daemon.
Two types of parallelism are added:

- concurrency to speed up what a single process does
- a way to run multiple daemons to work on different parts of the work

DocImpact

Change-Id: I48997f68eb2fd8de19a5ee8b9fcdf76dde2ba0ab
2013-06-07 20:49:47 +00:00
Samuel Merritt
b491549ac2 Use threadpools in the object server for performance.
Without a (per-disk) threadpool, requests to a slow disk would affect
all clients by blocking the entire eventlet reactor on
read/write/etc. The slower the disk, the worse the performance. On an
object server, you frequently have at least one slow disk due to
auditing and replication activity sucking up all the available IO. By
kicking those blocking calls out to a separate OS thread, we let the
eventlet reactor make progress in other greenthreads, and by having a
per-disk pool, we ensure that one slow disk can't suck up all the
resources of an entire object server.

There were a few blocking calls that were done with eventlet.tpool,
but that's a fixed-size global threadpool, so I moved them to the
per-disk threadpools. If the object server is configured not to use
per-disk threadpools, (i.e. threads_per_disk = 0, which is the
default), those call sites will still ultimately end up using
eventlet.tpool.execute. You won't end up blocking a whole object
server while waiting for a huge fsync.

If you decide not to use threadpools, the only extra overhead should
be a few extra Python function calls here and there. This is
accomplished by setting threads_per_disk = 0 in the config.

blueprint concurrent-disk-io
Change-Id: I490f8753d926fdcee3a0c65c5aaf715bc2b7c290
2013-06-07 13:06:04 -07:00
Pete Zaitcev
4b5db1dd0a Improve config samples
- Add proxy-logging to multinode. We had it since Folsom and people
  still forget it, resulting in missing logs.
- Use correct name, for ease hit with '*' in vi at least.

Admittedly trivial changes, which I meant to hold until Leah's major
doc improvement lands, but I'm tired of keeping stuff like this in
my working repo.

Change-Id: I44f80c51d6d7329a9b696e67fcb8a895db63e497
2013-06-06 19:41:13 -06:00
Samuel Merritt
efdb0e3681 Make sample configs more readable.
Inject some empty lines to avoid the wall-of-text effect and to make
it a little clearer which descriptions go with which options.

Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83
2013-06-06 15:35:19 -07:00
Dieter Plaetinck
442fd83a8b implement an rsync_bwlimit setting for object replicator
Change-Id: I8789d6e4d22de83db9a2760d51a94eb56a48c3b5
2013-05-31 15:57:19 -04:00
Donagh McCabe
34e2ab3f31 account-reaper warns if not making progress
DocImpact
If account reaper has not managed to clean out an account after a long
period, it prints a message to the log (you can search your system looking
for such messages). Introduce reap_warn_after config variable to determine
when to emit the message (defaults to 30 days).

Also fix bug 1181995 (edge case where object name is an empty string)

Change-Id: Ic0dfee04742d06b6a51b59f302d7a272d7c1de92
2013-05-22 15:07:17 +01:00
Jenkins
959f5e7ea8 Merge "Implementation of replication servers" 2013-05-16 02:43:49 +00:00
Jenkins
50157243dd Merge "Refactor Bulk middleware to handle long running requests" 2013-05-15 23:14:15 +00:00
David Goetz
af2607c457 Refactor Bulk middleware to handle long running requests
Change-Id: I8ea0ff86518d453597faae44ec3918298e2d5147
2013-05-08 10:00:21 -07:00
Clay Gerrard
34f5085c3e conf.d support
Allow Swift daemons and servers to optionally accept a directory as the
configuration parameter.  Directory based configuration leverages
ConfigParser's native multi-file support.  Files ending in '.conf' in the
given directory are parsed in lexicographical order.  Filenames starting with
'.' are ignored.  A mixture of file and directory configuration paths is not
supported - if the configuration path is a file behavior is unchanged.

 * update swift-init to search for conf.d paths when building servers
    (e.g. /etc/swift/proxy-server.conf.d/)
 * new script swift-config can be used to inspect the cumulative configuration
 * pull a little bit of code out of run_wsgi and test separately
 * fix example config bug for the proxy servers client_disconnect option
 * added section on directory based configuration to deployment guide

DocImpact

Implements: blueprint confd
Change-Id: I89b0f48e538117f28590cf6698401f74ef58003b
2013-04-30 00:17:46 -07:00
Peter Portante
2d42b37303 Add the max_clients parameter to bound clients
The new max_clients parameter allows one full control over the maximum
number of client requests that will be handled by a given worker for
any of the proxy, account, container or object servers.

Lowering the number of clients handled per worker, and raising the
number of workers can lessen the impact that a CPU intensive, or
blocking, request can have on other requests served by the same
worker.

If the maximum number of clients is set to one, then a given worker
will not perform another accept(2) call while processing, allowing
other workers a chance to process it.

DocImpact
Signed-off-by: Peter Portante <peter.portante@redhat.com>

Change-Id: Ic01430f7a6c5ff48d7aa349dc86a5f8ac463a420
2013-04-26 10:29:57 -04:00
David Hadas
04a3ba43ae Fixing /etc/swift.conf-sample to include
swift_hash_path_prefix

Change-Id: I60f5f3d4083937a03ecb7ed531185c617ea08016
2013-04-22 19:32:45 +03:00
Sergey Kraynev
ea7858176b Implementation of replication servers
Support separate replication ip address:
- Added new function in utils. This function provides ability
  to select separate IP address for replication service.
- Db_replicator and object replicators were changed.
  Replication process uses new function now.

Replication network parameters:
- Replication network fields (replication_ip, replication_port)
  support was added to device dictionary in swift-ring-builder script.
- Changes were made to support new fields in search, show and set_info
  functions.

Implementation of replication servers:
- Separate replication servers use the same code as normal replication
  servers, but with replication_server parameter = True.  When using a
  separate replication network, the non-replication servers set
  replication_server = False.  When there is no separate replication
  network (the default case), replication_server is not included in the config.

DocImpact
Change-Id: Ie9af5bdcdf9241c355e36053ca4adfe49dc35bd0
Implements: blueprint dedicated-replication-network
2013-04-21 18:14:42 -04:00
Jenkins
c87576bb94 Merge "Refactored lists of nodes to contact for requests" 2013-04-15 23:05:47 +00:00
Jenkins
5140c0d5da Merge "Adding a new optional variable called trans_id_suffix" 2013-04-10 20:05:03 +00:00
Marcelo Martins
1126e59c12 Adding a new optional variable called trans_id_suffix
The trans_id_suffix (default is empty) would be appended to the swift transaction
id allowing one to easily figure out from which cluster that X-Trans-Id
belongs to. This is very useful when one is managing more than one swift
cluster. Also updated sample and manpage to reflect the changes.

Change-Id: Icdf63643e9c1bde36a9ef5e3f41ee9fb20e55f5d
2013-04-10 06:37:32 -05:00
gholt
d79a67ebf6 Refactored lists of nodes to contact for requests
Extensive refactor here to consolidate what nodes are contacted for
any request. This consolidation means reads will contact the same set
of nodes that writes would, giving a very good chance that
read-your-write behavior will succeed. This also means that writes
will not necessarily try all nodes in the cluster as it would
previously, which really wasn't desirable anyway. (If you really want
that, you can set request_node_count to a really big number, but
understand that also means reads will contact every node looking for
something that might not exist.)

* Added a request_node_count proxy-server conf value that allows
  control of how many nodes are contacted for a normal request.

In proxy.controllers.base.Controller:

* Got rid of error_increment since it was only used in one spot by
  another method and just served to confuse.

* Made error_occurred also log the device name.

* Made error_limit require an error message and also documented a bit
  better.

* Changed iter_nodes to just take a ring and a partition and yield
  all the nodes itself so it could control the number of nodes used
  in a given request. Also happens to consolidate where sort_nodes is
  called.

* Updated account_info and container_info to use all nodes from
  iter_nodes and to call error_occurred appropriately.

* Updated GETorHEAD_base to not track attempts on its own and just
  stop when iter_nodes tells it to stop. Also, it doesn't take the
  nodes to contact anymore; instead it takes the ring and gets the
  nodes from iter_nodes itself.

Elsewhere:

* Ring now has a get_part method.

* Made changes to reflect all of the above.

Change-Id: I37f76c99286b6456311abf25167cd0485bfcafac
2013-04-08 20:48:32 +00:00
gholt
1cb952a958 Allow a configurable set of TempURL methods
Folks have actually been asking for this. I think they're sending a
DELETE TempURL to someone way ahead of time and the someone issues it
when they're ready. Honestly, I'm not entirely sure of the use case,
but having the set of methods configurable wouldn't hurt.

Change-Id: Ibdb48f8a72077b045eeedddfae4c0a1f56098d7a
2013-04-04 20:37:23 +00:00
Christian Schwede
28c75db0e7 Account quotas
Add a new middleware implementing account quotas.

This middleware blocks write requests (PUT, POST) if a given quota (in bytes)
is exceeded while DELETE requests are still allowed.

Quotas are stored in the x-account-meta-quota-bytes metadata entry.
Write requests to this metadata setting are only allowed for resellers.

Change-Id: I57fd7c6209f34cc79d4bab72d500d43ba2a62083
2013-03-08 14:31:35 +01:00
Sergey Lukjanov
7d5095c122 Support listing endpoints for an object.
Implements blueprint list-endpoints.

DocImpact: new middleware list_endpoints.

Change-Id: I0c4911ff726abd4cb8ce2b6245c99786ad46b410
2013-03-07 01:38:21 +04:00
David Goetz
5d73da158b Static Large Object Support
DocImpact

Change-Id: I7edaa5e44208ab451f7f7566b64bb571b8eea1f9
2013-03-01 16:46:10 -08:00
Jenkins
a06c71c624 Merge "Add cache=swift.cache for authtoken example." 2013-02-27 16:10:57 +00:00
Jenkins
1dc38b4672 Merge "timing-based affinity sorting for primary replicas" 2013-02-27 01:30:15 +00:00
Jenkins
249a65461e Merge "Adding speed limit options for DB auditor" 2013-02-26 06:22:25 +00:00
Samuel Merritt
a4a047c4ec Fix descriptions in sample configs.
Change-Id: I7aca3c6cafd9391031f7a10cc233f99e81ee0393
2013-02-25 14:48:06 -08:00
Chmouel Boudjnah
e69f3bef8f Add cache=swift.cache for authtoken example.
- Things swill go badly with swift if we leave the default to authtoken
  to use its own memcache cache connection based python-memcache c based
  binding.

Change-Id: I293b875acdcb06e5a7a0cfa9a9bb5d7678675da0
2013-02-21 22:58:27 +01:00
yuan-zhou
09370862ca Adding speed limit options for DB auditor
Fix bug 1129760

Without speed limit, DB auditor will likely consume high CPU% on
storage node. That will highly impact the cluster's performance.

This patch adds two options for account/container auditor:
 - containers_per_second: Maximum containers audited per second
 - accounts_per_second: Maximum accounts audited per second

DocImpact

Change-Id: I9faa506438185a83ca77db4906969328624d015f
2013-02-20 13:54:59 +08:00
John Dickinson
69917347cf timing-based affinity sorting for primary replicas
This changes the way primary replicas can be sorted on GET requests.
Previously, replicas were shuffled. Now, if configured, the replicas
are sorted based on the most recent connection time data to that node.
This patch adds a config value that changes the sorting method.

get_more_nodes() (ie handoffs) is unaffected by this patch because
sorting by affinity would break the durability provided by the current
as-unique-as-possible handoff selection.

Timing data is collected for each node each time the proxy makes a
connection to that node (IP address). If timing data for a node doesn't
exist, then it is assumed at -1 (ie will sort earlier) so that timing
data can be collected for that node.

Change-Id: I837fa21c3a566b10cce33eb75788665e1d01cd8a
2013-02-19 10:25:25 -08:00
Jenkins
23f33b2069 Merge "Make statsd sample rate behave better." 2013-02-13 08:19:46 +00:00
Jenkins
3df9229dae Merge "Use a doubled proxy-logging instead of each middleware handling it differently (if at all)" 2013-02-12 03:47:13 +00:00
David Goetz
a622349eda Use a doubled proxy-logging instead of each middleware handling it
differently (if at all)

Adding a swift.source to wsgi pre_auth funcs and all middleware that makes
subrequests to proxy server.

NOTE: This change will result in a change in the number of proxy logs made for
staticweb, formpost, tempurl, and any other middleware that performs sub
requests (including swauth and SOS).

Please see docs for details.

DocImpact

Change-Id: I80cf2806add1c3d34054147e2515944be340455b
2013-02-11 09:22:25 -08:00
Jenkins
c0e8ad609b Merge "Allow change the endpoint_type when use swift-dispersion tools" 2013-02-08 23:55:03 +00:00
Michael Barton
24ef12027c Basic container quotas
Add a new middleware implementing some basic container quotas.

Quotas are subject to several limitations: eventual consistency, the timeliness
of the cached container_info (60 second ttl by default), and it’s unable to
reject chunked transfer uploads that exceed the quota (though once the quota
is exceeded, new chunked transfers will be refused).

However, they get most of the way to container quotas fairly inexpensively.

Quotas are set by adding meta values to the container, and are validated when
set:

  X-Container-Meta-Quota-Bytes: Maximum size of the container, in bytes.
  X-Container-Meta-Quota-Count: Maximum object count of the container.

DocImpact

Change-Id: I77cfbf6dc231a2e522bd67328e4c082424a93eee
2013-02-05 06:03:38 -08:00
gholt
85529531d6 Remove tempauth allowed_sync_hosts conf option
Seems we missed these references when committing
357b12dc2ba7b19c66196a573ccb2489d2104b93

DocImpact

Change-Id: Ia226ce1d63e52769bc067d50ec4704cea4e11c5c
2013-01-31 18:30:10 +00:00
Mehdi Abaakouk
a1395ec672 Allow change the endpoint_type when use swift-dispersion tools
Fixes bug 1102319
DocImpact

Change-Id: I8fb0417ab9468e97ed01a6cb1e262630905e7f29
2013-01-31 16:10:37 +01:00
gholt
87a42ab9ca Added fallocate_reserve option
Some systems behave badly when they completely run out of space. To
alleviate this problem, you can set the fallocate_reserve conf value
to a number of bytes to "reserve" on each disk. When the disk free
space falls at or below this amount, fallocate calls will fail, even
if the underlying OS fallocate call would succeed. For example, a
fallocate_reserve of 5368709120 (5G) would make all fallocate calls
fail, even for zero-byte files, when the disk free space falls under
5G.

The default fallocate_reserve is 0, meaning "no reserve", and so the
software behaves exactly as it always has unless you set this conf
value to something non-zero.

Also fixed ring builder's search_devs doc bugs.

Related: To get rsync to do the same, see
https://github.com/rackspace/cloudfiles-rsync
Specifically, see this patch:
https://github.com/rackspace/cloudfiles-rsync/blob/master/debian/patches/limit-fs-fullness.diff

DocImpact

Change-Id: I8db176ae0ca5b41c9bcfeb7cb8abb31c2e614527
2013-01-29 20:07:26 +00:00
David Goetz
2f663ff9a0 Bulk Requests: auto extract archive and bulk delete middleware.
Fix small problem in ratelimiting middleware.

DocImpact

Change-Id: Ide3e0b9f4887626c30cae0b97eb7e2237b1df3ed
2013-01-24 12:34:56 -08:00
Darrell Bishop
8801b74090 Make statsd sample rate behave better.
As Dieter pointed out in bug 1090495
(https://bugs.launchpad.net/swift/+bug/1090495), the volume of metrics
can vary wildly between StatsD metrics.

This patch implements a partial solution by reducing the sample_rate
used for known high-volume metrics (operational experience will need to
inform this over time) and introducing a new tunable,
log_statsd_sample_rate_factor which is multiplied by the sample_rate for
every statsd stat.  This tunable can be used to reduce StatsD traffic
proportionally for all metrics and is intended to replace
log_statsd_default_sample_rate, which is left alone for
backward-compatibility, should anyone be using it.

This patch also includes a drive-by fix for log_udp_port which wasn't
being converted to an int (I didn't verify that actually causes trouble
in SysLogHandler(), but it's definitely an improvement regardles).

Change-Id: Id404636e3629f6431cf1c4e64a143959750a3c23
2013-01-19 15:25:27 -08:00
Jenkins
8b770aa55e Merge "Add config option to turn eventlet debug on/off" 2012-12-10 20:37:31 +00:00
Jenkins
1619cee011 Merge "Add config of server start timeouts for probetests" 2012-12-10 17:14:49 +00:00
Jenkins
067335a6e7 Merge "Add dispersion report flags to limit reports" 2012-12-10 16:49:33 +00:00
Chuck Thier
4c6a354483 Add config option to turn eventlet debug on/off
By default, this will be turned off.  This will cause eventlet to not
print stack traces to stderr which can be very annoying on production
systems.  It is still recommended to turn it on for development or
debuging purposes.

DocImpact
Change-Id: I5e5b902d3d9ed85f784549e53f2ee2fc87cbe2e5
2012-12-10 10:22:09 -06:00
Florian Hines
e474dfb720 Add dispersion report flags to limit reports
- Add two optional flags that let you limit swift-dispersion-report to only
reporting on containers OR objects.
- Also make dispersion.conf and swift-dispersion-report manpages
  current.

DocImpact

Change-Id: Iad56133cad261241db27d0e2103098e3c2f3c245
2012-12-09 18:20:08 -06:00
clayg
3a70112d03 Add config of server start timeouts for probetests
Currently the timeout for a wsgi server successfully binding to a port
and for a probetest background service to finish starting are hard coded
to 30 seconds.  While a reasonable default for most configurations, a
small virtualized environment may need a little more time in order for
probe tests to complete successfully.

This patch adds a 'bind_timeout' option to the DEFAULT section of the
main wsgi servers' config.  Also a new [probe_test] section and
'check_server_timeout' option to test.conf

DocImpact

Change-Id: Ibcaff153c7633bbf32e460fd9dbf04932eddb56f
2012-12-07 14:47:08 -08:00
Darrell Bishop
b8e3e9e1c2 Allow optional, temporary healthcheck failure.
A deployer may want to remove a Swift node from a load balancer for
maintenance or upgrade.  This patch provides an optional mechanism for
this.  The healthcheck filter config can specify "disable_path" which is
a filesystem path.  If a file is present at that location, the
healthcheck middleware returns a 503 with a body of "DISABLED BY FILE".

So a deployer can configure "disable_path" and then touch that
filesystem path, wait for the proxy to be removed from the load balancer
pool, perform maintenance/upgrade, and then remove the "disable_path"
file.

Also cleaned up the conf file man pages a bit.

Change-Id: I1759c78c74910a54c720f298d4d8e6fa57a4dab4
2012-12-04 09:14:27 -08:00