swift

Author	SHA1	Message	Date
Ondřej Nový	e0430fc74a	Compare Swift config checksum in swift-recon --all Change-Id: I796fe0895f4e5ddeb04c0d79a73579ce8bb9aa40	2015-11-05 21:21:21 +01:00
Paul Dardeau	73e032049f	Update admin guide with region. Added region prefix to example commands for adding devices to ring. Also updates description to include region prefix. Change-Id: Ie6d6485b497cea973e37909b5b19b44946c8aa89	2015-10-23 18:20:25 +00:00
Jenkins	63ab40db9a	Merge "Improving statistics sent to Graphite."	2015-09-09 07:12:01 +00:00
Carlos Cavanna	4765189ef3	Improving statistics sent to Graphite. Currently, statistics are organized by command. However, it would also be useful to display statistics organized by policy. Different policies may be based on different storage properties (ie, faster disks). With this change, all the statistics for object timers will be sent per policy as well. Policy statistics reporting will use policy index and the name in Graphite will show as proxy-server.object.policy.<policy-index>.<verb>, etc. Updated unit tests for per-policy stat reporting and added new unit tests for invalid cases. Updated documentation in the Administrator's Guide to reflect this new aggregation. Change-Id: Id70491e4833791a3fb8ff385953d69018514cd9c	2015-08-21 13:45:00 -04:00
Hisashi Osanai	79ba4a8598	Enable Object Replicator's failure count in recon This patch makes the count of object replication failure in recon. And "failure_nodes" is added to Account Replicator and Container Replicator. Recon shows the count of object repliction failure as follows: $ curl http://<ip>:<port>/recon/replication/object { "replication_last": 1416334368.60865, "replication_stats": { "attempted": 13346, "failure": 870, "failure_nodes": { "192.168.0.1": {"sdb1": 3}, "192.168.0.2": {"sdb1": 851, "sdc1": 1, "sdd1": 8}, "192.168.0.3": {"sdb1": 3, "sdc1": 4} }, "hashmatch": 0, "remove": 0, "rsync": 0, "start": 1416354240.9761429, "success": 1908 }, "replication_time": 2316.5563162644703, "object_replication_last": 1416334368.60865, "object_replication_time": 2316.5563162644703 } Note that 'object_replication_last' and 'object_replication_time' are considered to be transitional and will be removed in the subsequent releases. Use 'replication_last' and 'replication_time' instead. Additionaly this patch adds the count in swift-recon and it will be showed as follows: $ swift-recon object -r ======================================================================== ======= --> Starting reconnaissance on 4 hosts ======================================================================== ======= [2014-11-27 16:14:09] Checking on replication [replication_failure] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 4 [replication_success] low: 3, high: 3, avg: 3.0, total: 12, Failed: 0.0%, no_result: 0, reported: 4 [replication_time] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 4 [replication_attempted] low: 1, high: 1, avg: 1.0, total: 4, Failed: 0.0%, no_result: 0, reported: 4 Oldest completion was 2014-11-27 16:09:45 (4 minutes ago) by 192.168.0.4:6002. Most recent completion was 2014-11-27 16:14:19 (-10 seconds ago) by 192.168.0.1:6002. ======================================================================== ======= In case there is a cluster which has servers, a server runs with this patch and the other servers run without this patch. If swift-recon executes on the server which runs with this patch, there are unnecessary information on the output such as [failure], [success] and [attempted]. Because other servers which run without this patch are not able to send a response with information that this patch needs. Therefore once you apply this patch, you also apply this patch to other servers before you execute swift-recon. DocImpact Change-Id: Iecd33655ae2568482833131f422679996c374d78 Co-Authored-By: Kenichiro Matsuda <matsuda_kenichi@jp.fujitsu.com> Co-Authored-By: Brian Cline <bcline@softlayer.com> Implements: blueprint enable-object-replication-failure-in-recon	2015-08-18 11:40:02 +09:00
Jenkins	617c6b0107	Merge "Time synchronization check in recon."	2015-08-18 01:21:22 +00:00
Jenkins	57791b6cd2	Merge "+Document method to avoid rsync filling root drive"	2015-08-11 08:27:17 +00:00
Ben Martin	89f5906286	+Document method to avoid rsync filling root drive When rsync pushes to a remote node with an unmounted drive and if certain steps are not taken, rsync may attempt to write files to the local drive at the location where the drive was mounted. There are two suggested solutions for this issue: 1) Set the permissions for all mount points in /srv/node/ to root:root 755 2) Mount the drives elsewhere and symlink the drives to /srv/.../ The first method ensures that only root and not the swift user can write in the /srv/.../ directories. The second method will prompt a broken link issue if rsync attempts to write to an unmounted drive. Change-Id: I60ce4ed9ef8401768d5f78b6806cbb2e2a65303e Closes-Bug: #1470576	2015-08-05 09:29:07 -05:00
Jenkins	e1683fdb2e	Merge "Support keystone v3 domains in swift-dispersion"	2015-07-31 06:59:01 +00:00
Falk Reimann	363a256e58	Support keystone v3 domains in swift-dispersion This provides the capability to specify a project_name, project_domain_name and user_domain_name in /etc/swift/dispersion.conf. If this values are set in dispersion.conf they get populated to the swift-client. With this it is possible to have a specific dispersion project specified, which is not the keystone default domain. Changes were applied to swift-dispersion-populate and swift-dispersion-report. Relevant man pages, the example dispersion.conf and the admin guide were updated accordingly. DocImpact Closes-Bug: #1468374 Change-Id: I0e716f8d281b4d0f510bc568bcee4a13fc480ff7	2015-07-24 13:40:24 -05:00
Ondrej Novy	dd2f1be3b1	Time synchronization check in recon. This change add call time to recon middleware and param --time to recon CLI. This is usefull for checking if time in cluster is synchronized. Change-Id: I62373e681f64d0bd71f4aeb287953dd3b2ea5662	2015-07-23 11:35:02 +02:00
paul luse	e6165a7879	Add policy support to dispersion tools Doesn't work for anything other than policy 0. updated to allow user to specify policy name on cmd line (as with object-info) which then makes populate/report work with 3x, 2x, or EC style policies Change-Id: Ib7c298f0f6d666b1ecca25315b88539f45cf9f95 Closes-Bug: 1458688	2015-06-23 02:14:02 -07:00
Christian Schwede	55dd705a86	Add missing statsd metrics section for object-reconstructor Change-Id: Id3f98e5f637ff537a387262b40f21c05876fca91	2015-05-06 19:53:09 +02:00
Samuel Merritt	8d3b3b2ee0	Add some debug output to the ring builder Sometimes, I get handed a builder file in a support ticket and a question of the form "why is the balance [not] doing $thing?". When that happens, I add a bunch of print statements to my local swift/common/ring/builder.py, figure things out, and then delete the print statements. This time, instead of deleting the print statements, I turned them into debug() calls and added a "--debug" flag to the rebalance command in hopes that someone else will find it useful. Change-Id: I697af90984fa5b314ddf570280b4585ba0ba363c	2015-03-30 17:47:28 -07:00
Shilla Saebi	a1872b0498	Fix 2 typos in admin_guide file Change-Id: Ibf1e5dbf6ff4747c7f23f6638321ab41bba3021b	2014-11-24 15:38:25 +00:00
Christian Schwede	83030b921d	Update admin guide on handling drive failures Simply replacing a failed disk requires a very long time if the ring is not changed, because all data will be replicated to a single new disk. This extends the time to recover from missing replicas, and becomes even more important with bigger disks. This patch updates the doc to include a faster alternative by setting the weight of a failed disk to 0. In this case the partitions from the failed disk are distributed and replicated to the remaining disks in the cluster, and because each disk gets only a fraction of the partitions it's also much faster. Change-Id: I16617756359771ad89ca5d4690b58a014f481d9b	2014-10-29 10:39:01 +00:00
Christian Schwede	6d5206325a	Add a reference to the OpenStack security guide Change-Id: I85b9591a17572eb57f315d7b8e6741455aef5817	2014-10-07 21:54:17 +02:00
Paul Luse	e52e8bc917	Add Storage Policy Documentation Add overview and example information for using Storage Policies. DocImpact Implements: blueprint storage-policies Change-Id: I6f11f7a1bdaa6f3defb3baa56a820050e5f727f1	2014-06-19 10:18:34 -07:00
Chuck Thier	0a122c1575	Add targeted config loading to swift-init This allows an easier and more explicit way to tell swift-init to run on specific servers. For example with an SAIO, this allows you to do something like: swift-init object-server.1 reload to reload just the 1st object server. A more real world example is when you are running separate servers for replication. In this example you might have an object-server/public.conf and object-server/replication.conf. With this change you can do something like: swift-init object-server.replication reload to just reload the replication server. DocImpact Change-Id: I5c6046b5ee28e17dadfc5fc53d1d872d9bb8fe48	2014-05-19 14:43:50 +00:00
Madhuri Kumari	c90ede29ff	Added swift-account-info tool. This is a very simple swift tool to retrieve information of an account that is located on the storage node. One can call the tool with a given account db file as it is stored on the storage node system. It will then return several information about that account. Change-Id: Ibfeee790adc000fc177b4b3c03d22ff785fda325	2014-03-31 10:05:52 +05:30
Madhuri Kumari	6b441fabd9	Added swift-container-info tool. This is a very simple swift tool to retrieve information of a container that is located on the storage node. One can call the tool with a given container db file as it is stored on the storage node system. It will then return several information about that container. Change-Id: Ifebaed6c51a9ed5fbc0e7572bb43ef05d7dd254b	2014-03-22 17:30:20 +05:30
Eamonn O'Toole	793489b80d	Allow specification of object devices for audit In object audit "once" mode we are allowing the user to specify a sub-set of devices to audit using the "--devices" command-line option. The sub-set is specified as a comma-separated list. This patch is taken from a larger patch to enable parallel processing in the object auditor. We've had to modify recon so that it will work properly with this change to "once" mode. We've modified dump_recon_cache() so that it will store nested dictionaries, in other words it will store a recon cache entry such as {'key1': {'key2': {...}}}. When the object auditor is run in "once" mode with "--devices" set the object_auditor_stats_ALL and ZBF entries look like: {'object_auditor_stats_ALL': {'disk1disk2..diskn': {...}}}. When swift-recon is run, it hunts through the nested dicts to find the appropriate entries. The object auditor recon cache entries are set to {} at the beginning of each audit cycle, and individual disk entries are cleared from cache at the end of each disk's audit cycle. DocImpact Change-Id: Icc53dac0a8136f1b2f61d5e08baf7b4fd87c8123	2014-03-11 14:17:08 +00:00
Shane Wang	a94be9443d	Fix misspellings in swift Fix misspellings detected by: * pip install misspellings * git ls-files \| grep -v locale \| misspellings -f - Change-Id: I6594fc4ca5ae10bd30eac8a2f2493a376adcadee Closes-Bug: #1257295	2014-02-20 16:15:48 +08:00
Marcelo Martins	d2dd3e5488	Configuration options for error regex and log file in the config now Making it possible for one to overwrite the default set of regexes used to search for device block errors in the log file. Also making the log file naming pattern configurable by setting them in the drive-audit.conf file. Updating "Detecting Failed Drives" section on the admin guide as well. Change-Id: I7bd3acffed196da3e09db4c9dcbb48a20bdd1cf0	2013-07-23 07:24:29 -05:00
Samuel Merritt	d9f2a76973	Local write affinity for object PUT requests. The proxy can now be configured to prefer local object servers for PUT requests, where "local" is governed by the "write_affinity". The "write_affinity_node_count" setting controls how many local object servers to try before giving up and going on to remote ones. I chose to simply re-order the object servers instead of filtering out nonlocal ones so that, if all of the local ones are down, clients can still get successful responses (just slower). The goal is to trade availability for throughput. By writing to local object servers across fast LAN links, clients get better throughput than if the object servers were far away over slow WAN links. The downside, of course, is that data availability (not durability) may suffer when drives fail. The default configuration has no write affinity in it, so the default behavior is unchanged. Added some words about these settings to the admin guide. DocImpact Change-Id: I09a0bd00524544ff627a3bccdcdc48f40720a86e	2013-06-23 22:04:56 -07:00
Samuel Merritt	d88d12b120	Small clarification to swift-recon section of admin guide. Apparently the use of port 6030 was causing some confusion. Fixes bug 1095474. Change-Id: I0cc71f4733ad91694e015a9b75c3eda080aca6fb	2013-03-17 15:58:06 -07:00
Jian Zhang	1d8a02f25c	Added per disk PUT timing monitoring support. Fixes bug 1104708 There could be severe performance drop for swift is one disk of one storage node is problematic due to the tragic state of async disk I/O. This patch provided PUT timing per kB transfered (ms/kB) monitoring support for each non-zero-byte request of each disk and report to statsD for alert. -adding "object-server.PUT.<device>.timing" metrics for object-server. DocImpact. Change-Id: Ie94bddad28e8be52e71683bf6c9db988664abe47	2013-02-28 02:52:06 -08:00
Darrell Bishop	bce8443c9e	Adds first-byte latency timings for GET requests. This was an oustanding TODO for StatsD Swift metrics. A new timing metric is tracked for (only) GET requests for accounts, containers, and objects: proxy-server.<req_type>.GET.<status_int>.first-byte.timing Also updated StatsD documentation in the Admin Guide to clarify that timing metrics are sent in units of milliseconds. Change-Id: I5bb781c06cefcb5280f4fb1112a526c029fe0c20	2013-02-13 15:58:57 -08:00
Jenkins	23f33b2069	Merge "Make statsd sample rate behave better."	2013-02-13 08:19:46 +00:00
Joe Gordon	45f0502b52	Fix spelling mistakes git ls-files \| misspellings -f - Source: https://github.com/lyda/misspell-check Change-Id: I4132e6a276e44e2a8985238358533d315ee8d9c4	2013-02-12 16:39:40 -08:00
Mehdi Abaakouk	a1395ec672	Allow change the endpoint_type when use swift-dispersion tools Fixes bug 1102319 DocImpact Change-Id: I8fb0417ab9468e97ed01a6cb1e262630905e7f29	2013-01-31 16:10:37 +01:00
Florian Hines	00dbad0825	Add optional locking to swift-ring-builder If invoked as 'swift-ring-builder-safe' the directory containing the builder file provided will be locked (via lock_parent_directory()). This provides a small safe guard against multiple instances of the swift-ring-builder (or other utilities that observe this lock) from attempting to write to or read the builder/ring files while operations are in progress. This is particularly useful in environments where ring management has been automated (via Chef or custom solutions) but the operator still occasionally needs to manually interact with the ring. DocImpact Change-Id: Ia362744a8151a91bfb586d01da582906726852e6	2013-01-25 08:00:33 -08:00
Darrell Bishop	8801b74090	Make statsd sample rate behave better. As Dieter pointed out in bug 1090495 (https://bugs.launchpad.net/swift/+bug/1090495), the volume of metrics can vary wildly between StatsD metrics. This patch implements a partial solution by reducing the sample_rate used for known high-volume metrics (operational experience will need to inform this over time) and introducing a new tunable, log_statsd_sample_rate_factor which is multiplied by the sample_rate for every statsd stat. This tunable can be used to reduce StatsD traffic proportionally for all metrics and is intended to replace log_statsd_default_sample_rate, which is left alone for backward-compatibility, should anyone be using it. This patch also includes a drive-by fix for log_udp_port which wasn't being converted to an int (I didn't verify that actually causes trouble in SysLogHandler(), but it's definitely an improvement regardles). Change-Id: Id404636e3629f6431cf1c4e64a143959750a3c23	2013-01-19 15:25:27 -08:00
Florian Hines	e474dfb720	Add dispersion report flags to limit reports - Add two optional flags that let you limit swift-dispersion-report to only reporting on containers OR objects. - Also make dispersion.conf and swift-dispersion-report manpages current. DocImpact Change-Id: Iad56133cad261241db27d0e2103098e3c2f3c245	2012-12-09 18:20:08 -06:00
Jenkins	3af76e1096	Merge "statsd timing refactor"	2012-11-07 01:27:56 +00:00
Michael Barton	3586f829b0	statsd timing refactor Change-Id: I99d9ddfbcad0f88e75c49235c8317ea97237d4e4	2012-11-06 15:39:25 -08:00
John Dickinson	ec75d1e343	add OPTIONS to proxy_logging configs and docs Change-Id: I77e1d7fdcf217826402beeb7d583e3c7279c416c	2012-11-06 15:13:01 -08:00
Florian Hines	de09cbe6f4	Extended documentation for using custom loggers Change-Id: I78a5c109c9440df752e390698502f57d4392fb67	2012-10-26 17:59:42 -05:00
Samuel Merritt	851bbe2ea9	Track unlinks of async_pendings. It's not sufficient to just look at swift.object-updater.successes to see the async_pending unlink rate. There are two different spots where unlinks happen: one when an async_pending has been successfully processed, and another when the updater notices multiple async_pendings for the same object. Both events are now tracked under the same name: swift.object-updater.unlinks. FakeLogger has now sprouted a couple of convenience methods for testing logged metrics. Fixed pep8 1.3.3's complaints in the files this diff touches. Also: bonus speling and, grammar fixes in the admin guide. Change-Id: I8c1493784adbe24ba2b5512615e87669b3d94505	2012-10-23 10:27:21 -07:00
David Goetz	a6c44d2764	allow replicator run_once to check specific devices/partitions Change-Id: If45f77fda269ae6e251579542e70eb71bd11fe2a	2012-09-28 12:24:15 -07:00
Darrell Bishop	4a2ae2b460	Upating proxy-server StatsD logging. Removed many StatsD logging calls in proxy-server and added swift-informant-style catch-all logging in the proxy-logger middleware. Many errors previously rolled into the "proxy-server.<type>.errors" counter will now appear broken down by response code and with timing data at: "proxy-server.<type>.<verb>.<status>.timing". Also, bytes transferred (sum of in + out) will be at: "proxy-server.<type>.<verb>.<status>.xfer". The proxy-logging middleware can get its StatsD config from standard vars in [DEFAULT] or from access_log_statsd_* config vars in its config section. Similarly to Swift Informant, request methods ("verbs") are filtered using the new proxy-logging config var, "log_statsd_valid_http_methods" which defaults to GET, HEAD, POST, PUT, DELETE, and COPY. Requests with methods not in this list use "BAD_METHOD" for <verb> in the metric name. To avoid user error, access_log_statsd_valid_http_methods is also accepted. Previously, proxy-server metrics used "Account", "Container", and "Object" for the <type>, but these are now all lowercase. Updated the admin guide's StatsD docs to reflect the above changes and also include the "proxy-server.<type>.handoff_count" and "proxy-server.<type>.handoff_all_count" metrics. The proxy server now saves off the original req.method and proxy_logging will use this if it can (both for request logging and as the "<verb>" in the statsd timing metric). This fixes bug 1025433. Removed some stale access_log_* related code in proxy/server.py. Also removed the BaseApplication/Application distinction as it's no longer necessary. Fixed up the sample config files a bit (logging lines, mostly). Fixed typo in SAIO development guide. Got proxy_logging.py test coverage to 100%. Fixed proxy_logging.py for PEP8 v1.3.2. Enhanced test.unit.FakeLogger to track more calls to enable testing StatsD metric calls. Change-Id: I45d94cb76450be96d66fcfab56359bdfdc3a2576	2012-08-29 16:08:30 -07:00
Darrell Bishop	66400b7337	Add device name to *-replicator.removes for DBs To tell when replication for a device has finished, it's important to know when the replicator is removing objects. This was previously handled for the object-replicator (object-replicator.partition.delete.count.<device> and object-replicator.partition.update.count.<device> metrics) but not the account and container replicators. This patch extends the existing DB removal count metrics to make them per-device. The new metrics are: account-replicator.removes.<device> container-replicator.removes.<device> There's also a bonus refactoring and increased test coverage of the DB replicator code. Change-Id: I2067317d4a5f8ad2a496834147954bdcdfc541c1	2012-08-22 13:35:09 -07:00
Darrell Bishop	af2ff124eb	Update docs for new ring serialization. The Admin Guide now contains information about the ring serialization change (and importantly, how to downgrade, if necessary). Also added container-server conf var, "allow_versions" to the Deployment Guide. Also changed description of proxy-server conf var, "max_containers_whitelist" to say it contains "account names" not "account hashes". Change-Id: Ib23c6118cc5195cc04765afd28e442e4c735f0d4	2012-08-21 12:09:28 -07:00
Florian Hines	5f72a8db4a	Fix Dispersion report and swift-bench on saio We're still using saio:11000 in a few spots so a few things don't work out of the box on the saio. Fixes bug #1024561 Change-Id: I226de54c2785b0d0b681c8d0cc24260adbd3d663	2012-07-13 17:48:37 -05:00
John Dickinson	d668b27c09	fixed doc table format Change-Id: I319de933ecfb1e3853e3064656968c36980ce5f5	2012-05-28 13:36:59 -05:00
Florian Hines	ccb6334c17	Expand recon middleware support Expand recon middleware to include support for account and container servers in addition to the existing object servers. Also add support for retrieving recent information from auditors, replicators, and updaters. In the case of certain checks (such as container auditors) the stats returned are only for the most recent path processed. The middleware has also been refactored and should now also handle errors better in cases where stats are unavailable. While new check's have been added the output from pre-existing check's has not changed. This should allow existing 3rd party utilities such as the Swift ZenPack to continue to function. Change-Id: Ib9893a77b9b8a2f03179f2a73639bc4a6e264df7	2012-05-24 14:50:00 -05:00
Darrell Bishop	3d3ed34f44	Adding StatsD logging to Swift. Documentation, including a list of metrics reported and their semantics, is in the Admin Guide in a new section, "Reporting Metrics to StatsD". An optional "metric prefix" may be configured which will be prepended to every metric name sent to StatsD. Here is the rationale for doing a deep integration like this versus only sending metrics to StatsD in middleware. It's the only way to report some internal activities of Swift in a real-time manner. So to have one way of reporting to StatsD and one place/style of configuration, even some things (like, say, timing of PUT requests into the proxy-server) which could be logged via middleware are consistently logged the same way (deep integration via the logger delegate methods). When log_statsd_host is configured, get_logger() injects a swift.common.utils.StatsdClient object into the logger as logger.statsd_client. Then a set of delegate methods on LogAdapter either pass through to the StatsdClient object or become no-ops. This allows StatsD logging to look like: self.logger.increment('some.metric.here') and do the right thing in all cases and with no messy conditional logic. I wanted to use the pystatsd module for the StatsD client, but the version on PyPi is lagging the git repo (and is missing both the prefix functionality and timing_since() method). So I wrote my swift.common.utils.StatsdClient. The interface is the same as pystatsd.Client, but the code was written from scratch. It's pretty simple, and the tests I added cover it. This also frees Swift from an optional dependency on the pystatsd module, making this feature easier to enable. There's test coverage for the new code and all existing tests continue to pass. Refactored out _one_audit_pass() method in swift/account/auditor.py and swift/container/auditor.py. Fixed some misc. PEP8 violations. Misc test cleanups and refactorings (particularly the way "fake logging" is handled). Change-Id: Ie968a9ae8771f59ee7591e2ae11999c44bfe33b2	2012-05-11 15:25:38 -07:00
Paul McMillan	92fbf44d10	Fixed grammar and improve docs. Corrected its/it's mistakes, harmonized line wrapping within some docs and clarified doc wording in several places. Change-Id: Ib9ac6d5e859f770a702e1fad6de8d4abe0390b47	2012-04-10 12:27:14 -07:00
Florian Hines	5e4127ae2a	Add json output option to swift-dispersion-report Add's the configuration file option "dump_json" or command line options [-j\|--dump-json] to have swift-dispersion-report output the report in json format. This allows the dispersion report to be more easily consumed elsewhere. There's also a few pep8 fixes and removal of unused imports. Change-Id: I2374311ccbef43e6bbae24665c9584e60f3da173	2012-02-29 04:24:54 +00:00
gholt	65dba1a7aa	Added swift-orphans and swift-oldies. Change-Id: I95210098556a22d7bd05f245ae387ee13041fa61	2011-12-29 19:19:41 +00:00

1 2

65 Commits