This is in case a cluster gets a problem user who has distributed the
writes to a bunch of containers but is just taking too much of the
cluster's resources.
Change-Id: Ibd2ffd0e911463a432117b478585b9f8bc4a2495
* Introduce a new privileged account header: X-Account-Access-Control
* Introduce JSON-based version 2 ACL syntax -- see below for discussion
* Implement account ACL authorization in TempAuth
X-Account-Access-Control Header
-------------------------------
Accounts now have a new privileged header to represent ACLs or any other
form of account-level access control. The value of the header is an opaque
string to be interpreted by the auth system, but it must be a JSON-encoded
dictionary. A reference implementation is given in TempAuth, with the
knowledge that historically other auth systems often use TempAuth as a
starting point.
The reference implementation describes three levels of account access:
"admin", "read-write", and "read-only". Adding new access control
features in a future patch (e.g. "write-only" account access) will
automatically be forward- and backward-compatible, due to the JSON
dictionary header format.
The privileged X-Account-Access-Control header may only be read or written
by a user with "swift_owner" status, traditionally the account owner but
now also any user on the "admin" ACL.
Access Levels:
Read-only access is intended to indicate to the auth system that this
list of identities can read everything (except privileged headers) in
the account. Specifically, a user with read-only account access can get
a list of containers in the account, list the contents of any container,
retrieve any object, and see the (non-privileged) headers of the
account, any container, or any object.
Read-write access is intended to indicate to the auth system that this
list of identities can read or write (or create) any container. A user
with read-write account access can create new containers, set any
unprivileged container headers, overwrite objects, delete containers,
etc. A read-write user can NOT set account headers (or perform any
PUT/POST/DELETE requests on the account).
Admin access is intended to indicate to the auth system that this list of
identities has "swift_owner" privileges. A user with admin account access
can do anything the account owner can, including setting account headers
and any privileged headers -- and thus changing the value of
X-Account-Access-Control and thereby granting read-only, read-write, or
admin access to other users.
The auth system is responsible for making decisions based on this header,
if it chooses to support its use. Therefore the above access level
descriptions are necessarily advisory only for other auth systems.
When setting the value of the header, callers are urged to use the new
format_acl() method, described below.
New ACL Format
--------------
The account ACLs introduce a new format for ACLs, rather than reusing the
existing format from X-Container-Read/X-Container-Write. There are several
reasons for this:
* Container ACL format does not support Unicode
* Container ACLs have a different structure than account ACLs
+ account ACLs have no concept of referrers or rlistings
+ accounts have additional "admin" access level
+ account access levels are structured as admin > rw > ro, which seems more
appropriate for how people access accounts, rather than reusing
container ACLs' orthogonal read and write access
In addition, the container ACL syntax is a bit arbitrary and highly custom,
so instead of parsing additional custom syntax, I'd rather propose a next
version and introduce a means for migration. The V2 ACL syntax has the
following benefits:
* JSON is a well-known standard syntax with parsers in all languages
* no artificial value restrictions (you can grant access to a user named
".rlistings" if you want)
* forward and backward compatibility: you may have extraneous keys, but
your attempt to parse the header won't raise an exception
I've introduced hooks in parse_acl and format_acl which currently default
to the old V1 syntax but tolerate the V2 syntax and can easily be flipped
to default to V2. I'm not changing the default or adding code to rewrite
V1 ACLs to V2, because this patch has suffered a lot of scope creep already,
but this seems like a sensible milestone in the migration.
TempAuth Account ACL Implementation
-----------------------------------
As stated above, core Swift is responsible for privileging the
X-Account-Access-Control header (making it only accessible to swift_owners),
for translating it to -sysmeta-* headers to trigger persistence by the
account server, and for including the header in the responses to requests
by privileged users. Core Swift puts no expectation on the *content* of
this header. Auth systems (including TempAuth) are responsible for
defining the content of the header and taking action based on it.
In addition to the changes described above, this patch defines a format
to be used by TempAuth for these headers in the common.middleware.acl
module, in the methods format_v2_acl() and parse_v2_acl(). This patch
also teaches TempAuth to take action based on the header contents. TempAuth
now sets swift_owner=True if the user is on the Admin ACL, authorizes
GET/HEAD/OPTIONS requests if the user is on any ACL, authorizes
PUT/POST/DELETE requests if the user is on the admin or read-write ACL, etc.
Note that the action of setting swift_owner=True triggers core Swift to
add or strip the privileged headers from the responses. Core Swift (not
the auth system) is responsible for that.
DocImpact: Documentation for the new ACL usage and format appears in
summary form in doc/source/overview_auth.rst, and in more detail in
swift/common/middleware/tempauth.py in the TempAuth class docstring.
I leave it to the Swift doc team to determine whether more is needed.
Change-Id: I836a99eaaa6bb0e92dc03e1ca46a474522e6e826
This is needed for SOS (along with patch
https://github.com/dpgoetz/sos/pull/37)
to work with swift 1.12 . By spec you should always use the absolute
location but this causes a problem with staticweb over a cdn using a
cname. Basically you want to be able to forward the browser to a
relative location instead of whatever full url the proxy server
thinks you are using.
Change-Id: I3fa1d415bf9b566be069458b838f7e65db0c4f39
The purpose of GateKeeper mostly relates to the development of new swift code,
so I threw together a guide for development_middleware that covers some basics
with a eye towards metadata handling in-particular.
I also fixed up some missing autodoc's, split out middleware autodoc and added
some ref's here and about so I could link to them from the
development_middleware guide.
DocImpact
Change-Id: I20dd942ea8df9e33c3e794cb49669ffa1332c63e
Fix Error 400 Header Line Too Long when using Identity v3 PKI Tokens
Uses swift.conf max_header_size option to set wsgi.MAX_HEADER_LINE,
allowing the operator to customize this parameter.
The default value has been let to 8192 to avoid unexpected
configuration change on deployed platforms. The max_header_size option
has to be increased (for example to 16384), to accomodate for large
Identity v3 PKI tokens, including more than 7 catalog entries.
The default max header line size of 8192 is exceeded in the following
scenario:
- Auth tokens generated by Keystone v3 API include the catalog.
- Keystone's catalog contains more than 7 services.
Similar fixes have been merged in other projects.
Change-Id: Ia838b18331f57dfd02b9f71d4523d4059f38e600
Closes-Bug: 1190149
Summary of the new configuration option:
The cluster operators add the container_sync middleware to their
proxy pipeline and create a container-sync-realms.conf for their
cluster and copy this out to all their proxy and container servers.
This file specifies the available container sync "realms".
A container sync realm is a group of clusters with a shared key that
have agreed to provide container syncing to one another.
The end user can then set the X-Container-Sync-To value on a
container to //realm/cluster/account/container instead of the
previously required URL.
The allowed hosts list is not used with this configuration and
instead every container sync request sent is signed using the realm
key and user key.
This offers better security as source hosts can be faked much more
easily than faking per request signatures. Replaying signed requests,
assuming it could easily be done, shouldn't be an issue as the
X-Timestamp is part of the signature and so would just short-circuit
as already current or as superceded.
This also makes configuration easier for the end user, especially
with difficult networking situations where a different host might
need to be used for the container sync daemon since it's connecting
from within a cluster. With this new configuration option, the end
user just specifies the realm and cluster names and that is resolved
to the proper endpoint configured by the operator. If the operator
changes their configuration (key or endpoint), the end user does not
need to change theirs.
DocImpact
Change-Id: Ie1704990b66d0434e4991e26ed1da8b08cb05a37
Many of the large files are included in the tree and the script now
leverages a checked out swift tree to provide those files so that
users don't have to cut/paste text from the document. The contents of
those files are still included in the document for reference.
Updated to add sudo in appropriate places so that the entire script
can be run as the user instead of as root.
We also simplify the steps needed to get resetswift script working
(don't need to edit the user name).
Change-Id: Ie5b5a815870edcc205d273e35e0bbd2426d3b002
Signed-off-by: Peter Portante <peter.portante@redhat.com>
Set include_servce_catalog=False in Keystone's auth_token
example configuration. Swift does not use X-Service-Catalog
so there is no need to suffer its overhead. In addition,
service catalogs can be larger than max_header_size so this
change avoids a failure mode.
DocImpact
Relates to bug 1228317
Change-Id: If94531ee070e4a47cbd9b848d28e2313730bd3c0
Swift can now optionally be configured to allow requests to '/info',
providing information about the swift cluster. Additionally a HMAC
signed requests to
'/info?swiftinfo_sig=<sign>&swiftinfo_expires=<expires>' can be
configured allowing privileged access to more sensitive information
not meant to be public.
DocImpact
Change-Id: I2379360fbfe3d9e9e8b25f1dc34517d199574495
Implements: blueprint capabilities
Closes-Bug: #1245694
New replication_one_per_device (True by default)
that restricts incoming REPLICATION requests to
one per device, replication_currency allowing.
Also has replication_lock_timeout (15 by default)
to control how long a request will wait to obtain
a replication device lock before giving up.
This should be very useful in that you can be
assured any concurrent REPLICATION requests are
each writing to distinct devices. If you have 100
devices on a server, you can set
replication_concurrency to 100 and be confident
that, even if 100 replication requests were
executing concurrently, they'd each be writing to
separate devices. Before, all 100 could end up
writing to the same device, bringing it to a
horrible crawl.
NOTE: This is only for ssync replication. The
current default rsync replication still has the
potentially horrible behavior.
Change-Id: I36e99a3d7e100699c76db6d3a4846514537ff685
For this commit, ssync is just a direct replacement for how
we use rsync. Assuming we switch over to ssync completely
someday and drop rsync, we will then be able to improve the
algorithms even further (removing local objects as we
successfully transfer each one rather than waiting for whole
partitions, using an index.db with hash-trees, etc., etc.)
For easier review, this commit can be thought of in distinct
parts:
1) New global_conf_callback functionality for allowing
services to perform setup code before workers, etc. are
launched. (This is then used by ssync in the object
server to create a cross-worker semaphore to restrict
concurrent incoming replication.)
2) A bit of shifting of items up from object server and
replicator to diskfile or DEFAULT conf sections for
better sharing of the same settings. conn_timeout,
node_timeout, client_timeout, network_chunk_size,
disk_chunk_size.
3) Modifications to the object server and replicator to
optionally use ssync in place of rsync. This is done in
a generic enough way that switching to FutureSync should
be easy someday.
4) The biggest part, and (at least for now) completely
optional part, are the new ssync_sender and
ssync_receiver files. Nice and isolated for easier
testing and visibility into test coverage, etc.
All the usual logging, statsd, recon, etc. instrumentation
is still there when using ssync, just as it is when using
rsync.
Beyond the essential error and exceptional condition
logging, I have not added any additional instrumentation at
this time. Unless there is something someone finds super
pressing to have added to the logging, I think such
additions would be better as separate change reviews.
FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION
CLUSTERS. Some of us will be in a limited fashion to look
for any subtle issues, tuning, etc. but generally ssync is
an experimental feature. In its current implementation it is
probably going to be a bit slower than rsync, but if all
goes according to plan it will end up much faster.
There are no comparisions yet between ssync and rsync other
than some raw virtual machine testing I've done to show it
should compete well enough once we can put it in use in the
real world.
If you Tweet, Google+, or whatever, be sure to indicate it's
experimental. It'd be best to keep it out of deployment
guides, howtos, etc. until we all figure out if we like it,
find it to be stable, etc.
Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6
If you're setting one of these up, you're probably going to use it for
development, in which case you want everything but the kitchen sink
turned on so you can just start hacking away.
Change-Id: I98d178ff545cbf8d853c102e9fce76fb9f6773ac
Refactor on-disk knowledge out of the object server by pushing the
async update pickle creation to the new DiskFileManager class (name is
not the best, so suggestions welcome), along with the REPLICATOR
method logic. We also move the mount checking and thread pool storage
to the new ondisk.Devices object, which then also becomes the new home
of the audit_location_generator method.
For the object server, a new setup() method is now called at the end
of the controller's construction, and the _diskfile() method has been
renamed to get_diskfile(), to allow implementation specific behavior.
We then hide the need for the REST API layer to know how and where
quarantining needs to be performed. There are now two places it is
checked internally, on open() where we verify the content-length,
name, and x-timestamp metadata, and in the reader on close where the
etag metadata is checked if the entire file was read.
We add a reader class to allow implementations to isolate the WSGI
handling code for that specific environment (it is used no-where else
in the REST APIs). This simplifies the caller's code to just use a
"with" statement once open to avoid multiple points where close needs
to be called.
For a full historical comparison, including the usage patterns see:
https://gist.github.com/portante/5488238
(as of master, 2b639f5, Merge
"Fix 500 from account-quota This Commit
middleware")
--------------------------------+------------------------------------
DiskFileManager(conf)
Methods:
.pickle_async_update()
.get_diskfile()
.get_hashes()
Attributes:
.devices
.logger
.disk_chunk_size
.keep_cache_size
.bytes_per_sync
DiskFile(a,c,o,keep_data_fp=) DiskFile(a,c,o)
Methods: Methods:
*.__iter__()
.close(verify_file=)
.is_deleted()
.is_expired()
.quarantine()
.get_data_file_size()
.open()
.read_metadata()
.create() .create()
.write_metadata()
.delete() .delete()
Attributes: Attributes:
.quarantined_dir
.keep_cache
.metadata
*DiskFileReader()
Methods:
.__iter__()
.close()
Attributes:
+.was_quarantined
DiskWriter() DiskFileWriter()
Methods: Methods:
.write() .write()
.put() .put()
* Note that the DiskFile class * Note that the DiskReader() object
implements all the methods returned by the
necessary for a WSGI app DiskFileOpened.reader() method
iterator implements all the methods
necessary for a WSGI app iterator
+ Note that if the auditor is
refactored to not use the DiskFile
class, see
https://review.openstack.org/44787
then we don't need the
was_quarantined attribute
A reference "in-memory" object server implementation of a backend
DiskFile class in swift/obj/mem_server.py and
swift/obj/mem_diskfile.py.
One can also reference
https://github.com/portante/gluster-swift/commits/diskfile for the
proposed integration with the gluster-swift code based on these
changes.
Change-Id: I44e153fdb405a5743e9c05349008f94136764916
Signed-off-by: Peter Portante <peter.portante@redhat.com>
This reverts commit 7760f41c3ce436cb23b4b8425db3749a3da33d32
Change-Id: I95e57a2563784a8cd5e995cc826afeac0eadbe62
Signed-off-by: Peter Portante <peter.portante@redhat.com>
The SAIO is purpously cut into two parts, so that you don't have to switch
back and forth between root and your unprivledged user. Add some "note" box
callouts to highlight this changeover.
Change-Id: I8b1a8f0539eac60d4121bdd4dab01df75ecca207
This creates a pool to each memcache server so that connections will not
grow without bound. This also adds a proxy config
"max_memcache_connections" which can control how many connections are
available in the pool.
A side effect of the change is that we had to change the memcache calls
that used noreply, and instead wait for the result of the request.
Leaving with noreply could cause a race condition (specifically in
account auto create), due to one request calling `memcache.del(key)` and
then `memcache.get(key)` with a different pooled connection. If the
delete didn't complete fast enough, the get would return the old value
before it was deleted, and thus believe that the account was not
autocreated.
ClaysMindExploded
DocImpact
Change-Id: I350720b7bba29e1453894d3d4105ac1ea232595b
If you don't, then newer versions of xattr won't install, and since
our xattr requirement is simply ">= 0.4" in requirements.txt, this
affects anyone setting up a new SAIO.
This happened with xattr 0.7, which was released on 2013-07-19.
Change-Id: Iaf335fa25a2908953d1fd218158ebedf5d01cc27
Place all the methods related to on-disk layout and / or configuration
into a new common module that can be shared by the various modules
using the same on-disk layout.
Change-Id: I27ffd4665d5115ffdde649c48a4d18e12017e6a9
Signed-off-by: Peter Portante <peter.portante@redhat.com>
If handoffs_first is True, then the object replicator will give
partitions that are not supposed to be on the node priority.
If handoff_delete is set to a number (n), then it will delete a handoff
partition if at least n replicas were successfully replicated
Also fixed a couple of things in the object replicator unit tests and
added some more
DocImpact
Change-Id: Icb9968953cf467be2a52046fb16f4b84eb5604e4
The main purpose of this patch is to lay the groundwork for allowing
the container and account servers to optionally use pluggable backend
implementations. The backend.py files will eventually be the module
where the backend APIs are defined via docstrings of this reference
implementation. The swift/common/db.py module will remain an internal
module used by the reference implementation.
We have a raft of changes to docstrings staged for later, but this
patch takes care to relocate ContainerBroker and AccountBroker into
their new home intact.
Change-Id: Ibab5c7605860ab768c8aa5a3161a705705689b04
These are headers that will be stripped unless the WSGI environment
contains a true value for 'swift_owner'. The exact definition of a
swift_owner is up to the auth system in use, but usually indicates
administrative responsibilities.
DocImpact
Change-Id: I972772fbbd235414e00130ca663428e8750cabca
Making it possible for one to overwrite the default set of regexes
used to search for device block errors in the log file. Also making
the log file naming pattern configurable by setting them in the
drive-audit.conf file.
Updating "Detecting Failed Drives" section on the admin guide as well.
Change-Id: I7bd3acffed196da3e09db4c9dcbb48a20bdd1cf0
Change the default value of wsgi workers from 1 to auto. The new default
value for workers in the proxy, container, account & object wsgi servers will
spawn as many workers per process as you have cpu cores.
This will not be ideal for some configurations, but it's much more likely to
produce a successful out of the box deployment.
Inspect the number of cpu_cores using python's multiprocessing when available.
Multiprocessing was added in python 2.6, but I know I've compiled python
without it before on accident. The cpu_count method seems to be pretty system
agnostic, but it says it can raise NotImplementedError or sometimes return 0.
Add a new utility method 'config_auto_int_value' to pull an integer out of the
config which has a dynamic default.
* drive by s/container/proxy/ in proxy-server.conf.5
* fix misplaced max_clients in *-server.conf-sample
* update doc/development_saio to force workers = 1
DocImpact
Change-Id: Ifa563d22952c902ab8cbe1d339ba385413c54e95