If auth is setup in the env then it needs to be copied over with the
make_request wsgi helper. Also renamed make_request to
make_subrequest- when I grepped for make_request I got > 250 results,
this'll make it easier to find references to this function in the
future.
Updated docs and sample confs to show tempurl needs to be before dlo and
slo as well as auth.
Change-Id: I9750555727f520a7c9fedd5f4fd31ff0f63d8088
Currently all headers are logged if access_log_headers are enabled in
proxy_logging middleware. By adding the option access_log_headers_only
it is now possible to limit the logged headers to the given header values.
DocImpact
Change-Id: I0a1e40567c9eddc9bb00dd00373dc6eeb33d347c
The account which tracks objects scheduled for deletion had its name
hard-coded to 'expiring_objects'. This is made configurable via
expiring_objects_account_name option. Backend file-systems
integration efforts may want to treat these "special" accounts in a
different way.
This would still go undocumented, hence 'pseudo-hidden'.
UpgradeImpact: None as the default value would continue to be the same
which is '.expiring_objects'.
Change-Id: I1a093b0d0e2bdd0c3d723090af03fc0adf2ad7e3
Signed-off-by: Prashanth Pai <ppai@redhat.com>
This is for the same reason that SLO got pulled into middleware, which
includes stuff like automatic retry of GETs on broken connection and
the multi-ring storage policy stuff.
The proxy will automatically insert the dlo middleware at an
appropriate place in the pipeline the same way it does with the
gatekeeper middleware. Clusters will still support DLOs after upgrade
even with an old config file that doesn't mention dlo at all.
Includes support for reading config values from the proxy server's
config section so that upgraded clusters continue to work as before.
Bonus fix: resolve 'after' vs. 'after_fn' in proxy's required filters
list. Having two was confusing, so I kept the more-general one.
DocImpact
blueprint multi-ring-large-objects
Change-Id: Ib3b3830c246816dd549fc74be98b4bc651e7bace
* Introduce a new privileged account header: X-Account-Access-Control
* Introduce JSON-based version 2 ACL syntax -- see below for discussion
* Implement account ACL authorization in TempAuth
X-Account-Access-Control Header
-------------------------------
Accounts now have a new privileged header to represent ACLs or any other
form of account-level access control. The value of the header is an opaque
string to be interpreted by the auth system, but it must be a JSON-encoded
dictionary. A reference implementation is given in TempAuth, with the
knowledge that historically other auth systems often use TempAuth as a
starting point.
The reference implementation describes three levels of account access:
"admin", "read-write", and "read-only". Adding new access control
features in a future patch (e.g. "write-only" account access) will
automatically be forward- and backward-compatible, due to the JSON
dictionary header format.
The privileged X-Account-Access-Control header may only be read or written
by a user with "swift_owner" status, traditionally the account owner but
now also any user on the "admin" ACL.
Access Levels:
Read-only access is intended to indicate to the auth system that this
list of identities can read everything (except privileged headers) in
the account. Specifically, a user with read-only account access can get
a list of containers in the account, list the contents of any container,
retrieve any object, and see the (non-privileged) headers of the
account, any container, or any object.
Read-write access is intended to indicate to the auth system that this
list of identities can read or write (or create) any container. A user
with read-write account access can create new containers, set any
unprivileged container headers, overwrite objects, delete containers,
etc. A read-write user can NOT set account headers (or perform any
PUT/POST/DELETE requests on the account).
Admin access is intended to indicate to the auth system that this list of
identities has "swift_owner" privileges. A user with admin account access
can do anything the account owner can, including setting account headers
and any privileged headers -- and thus changing the value of
X-Account-Access-Control and thereby granting read-only, read-write, or
admin access to other users.
The auth system is responsible for making decisions based on this header,
if it chooses to support its use. Therefore the above access level
descriptions are necessarily advisory only for other auth systems.
When setting the value of the header, callers are urged to use the new
format_acl() method, described below.
New ACL Format
--------------
The account ACLs introduce a new format for ACLs, rather than reusing the
existing format from X-Container-Read/X-Container-Write. There are several
reasons for this:
* Container ACL format does not support Unicode
* Container ACLs have a different structure than account ACLs
+ account ACLs have no concept of referrers or rlistings
+ accounts have additional "admin" access level
+ account access levels are structured as admin > rw > ro, which seems more
appropriate for how people access accounts, rather than reusing
container ACLs' orthogonal read and write access
In addition, the container ACL syntax is a bit arbitrary and highly custom,
so instead of parsing additional custom syntax, I'd rather propose a next
version and introduce a means for migration. The V2 ACL syntax has the
following benefits:
* JSON is a well-known standard syntax with parsers in all languages
* no artificial value restrictions (you can grant access to a user named
".rlistings" if you want)
* forward and backward compatibility: you may have extraneous keys, but
your attempt to parse the header won't raise an exception
I've introduced hooks in parse_acl and format_acl which currently default
to the old V1 syntax but tolerate the V2 syntax and can easily be flipped
to default to V2. I'm not changing the default or adding code to rewrite
V1 ACLs to V2, because this patch has suffered a lot of scope creep already,
but this seems like a sensible milestone in the migration.
TempAuth Account ACL Implementation
-----------------------------------
As stated above, core Swift is responsible for privileging the
X-Account-Access-Control header (making it only accessible to swift_owners),
for translating it to -sysmeta-* headers to trigger persistence by the
account server, and for including the header in the responses to requests
by privileged users. Core Swift puts no expectation on the *content* of
this header. Auth systems (including TempAuth) are responsible for
defining the content of the header and taking action based on it.
In addition to the changes described above, this patch defines a format
to be used by TempAuth for these headers in the common.middleware.acl
module, in the methods format_v2_acl() and parse_v2_acl(). This patch
also teaches TempAuth to take action based on the header contents. TempAuth
now sets swift_owner=True if the user is on the Admin ACL, authorizes
GET/HEAD/OPTIONS requests if the user is on any ACL, authorizes
PUT/POST/DELETE requests if the user is on the admin or read-write ACL, etc.
Note that the action of setting swift_owner=True triggers core Swift to
add or strip the privileged headers from the responses. Core Swift (not
the auth system) is responsible for that.
DocImpact: Documentation for the new ACL usage and format appears in
summary form in doc/source/overview_auth.rst, and in more detail in
swift/common/middleware/tempauth.py in the TempAuth class docstring.
I leave it to the Swift doc team to determine whether more is needed.
Change-Id: I836a99eaaa6bb0e92dc03e1ca46a474522e6e826
Fix Error 400 Header Line Too Long when using Identity v3 PKI Tokens
Uses swift.conf max_header_size option to set wsgi.MAX_HEADER_LINE,
allowing the operator to customize this parameter.
The default value has been let to 8192 to avoid unexpected
configuration change on deployed platforms. The max_header_size option
has to be increased (for example to 16384), to accomodate for large
Identity v3 PKI tokens, including more than 7 catalog entries.
The default max header line size of 8192 is exceeded in the following
scenario:
- Auth tokens generated by Keystone v3 API include the catalog.
- Keystone's catalog contains more than 7 services.
Similar fixes have been merged in other projects.
Change-Id: Ia838b18331f57dfd02b9f71d4523d4059f38e600
Closes-Bug: 1190149
This way, with zero additional effort, SLO will support enhancements
to object storage and retrieval, such as:
* automatic resume of GETs on broken connection (today)
* storage policies (in the near future)
* erasure-coded object segments (in the far future)
This also lets SLOs work with other sorts of hypothetical third-party
middleware, for example object compression or encryption.
Getting COPY to work here is sort of a hack; the proxy's object
controller now checks for "swift.copy_response_hook" in the request's
environment and feeds the GET response (the source of the new object's
data) through it. This lets a COPY of a SLO manifest actually combine
the segments instead of merely copying the manifest document.
Updated ObjectController to expect a response's app_iter to be an
iterable, not just an iterator. (PEP 333 says "When called by the
server, the application object must return an iterable yielding zero
or more strings." ObjectController was just being too strict.) This
way, SLO can re-use the same response-generation logic for GET and
COPY requests.
Added a (sort of hokey) mechanism to allow middlewares to close
incompletely-consumed app iterators without triggering a warning. SLO
does this when it realizes it's performed a ranged GET on a manifest;
it closes the iterable, removes the range, and retries the
request. Without this change, the proxy logs would get 'Client
disconnected on read' in them.
DocImpact
blueprint multi-ring-large-objects
Change-Id: Ic11662eb5c7176fbf422a6fc87a569928d6f85a1
Summary of the new configuration option:
The cluster operators add the container_sync middleware to their
proxy pipeline and create a container-sync-realms.conf for their
cluster and copy this out to all their proxy and container servers.
This file specifies the available container sync "realms".
A container sync realm is a group of clusters with a shared key that
have agreed to provide container syncing to one another.
The end user can then set the X-Container-Sync-To value on a
container to //realm/cluster/account/container instead of the
previously required URL.
The allowed hosts list is not used with this configuration and
instead every container sync request sent is signed using the realm
key and user key.
This offers better security as source hosts can be faked much more
easily than faking per request signatures. Replaying signed requests,
assuming it could easily be done, shouldn't be an issue as the
X-Timestamp is part of the signature and so would just short-circuit
as already current or as superceded.
This also makes configuration easier for the end user, especially
with difficult networking situations where a different host might
need to be used for the container sync daemon since it's connecting
from within a cluster. With this new configuration option, the end
user just specifies the realm and cluster names and that is resolved
to the proper endpoint configured by the operator. If the operator
changes their configuration (key or endpoint), the end user does not
need to change theirs.
DocImpact
Change-Id: Ie1704990b66d0434e4991e26ed1da8b08cb05a37
Middleware or core features may need to store metadata
against accounts or containers. This patch adds a
generic mechanism for system metadata to be persisted
in backend databases, without polluting the user
metadata namespace, by using the reserved header
namespace x-<server_type>-sysmeta-*.
Modifications are firstly that backend servers persist
system metadata headers alongside user metadata and
other system state.
For accounts and containers, system metadata in PUT
and POST requests is treated in a similar way to user
metadata. System metadata is not yet supported for
object requests.
Secondly, changes in the proxy controllers ensure that
headers in the system metadata namespace will pass through
in requests to backend servers.
Thirdly, system metadata returned from backend servers
in GET or HEAD responses is added to the cached info
dict, which middleware can access.
Finally, a gatekeeper middleware module is provided
which filters all system metadata headers from requests
and responses by removing headers with names starting
x-account-sysmeta-, x-container-sysmeta-. The gatekeeper
also removes headers starting x-object-sysmeta- in
anticipation of future support for system metadata being
set for objects. This prevents clients from writing or
reading system metadata.
The required_filters list in swift/proxy/server.py is
modified to include the gatekeeper middleware so that
if the gatekeeper has not been configured in the
pipeline then it will be automatically inserted close
to the start of the pipeline.
blueprint cluster-federation
Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45
Bulk middleware now has a mechanism to retry a delete when the
HTTP response code is 409. This happens when the container still
has objects. It can be useful in a bulk delete where you delete
all the objects in a container and then try to delete the container.
It is very likely that at the end it will fail because the replica
objects have not been deleted by the time the middleware got a
successful response.
Change-Id: I1614fcb5cc511be26a9dda90753dd08ec9546a3c
Closes-Bug: #1253478
Set include_servce_catalog=False in Keystone's auth_token
example configuration. Swift does not use X-Service-Catalog
so there is no need to suffer its overhead. In addition,
service catalogs can be larger than max_header_size so this
change avoids a failure mode.
DocImpact
Relates to bug 1228317
Change-Id: If94531ee070e4a47cbd9b848d28e2313730bd3c0
These will allow clients to perform the minimal number of requests
required to accomplish some bulk tasks. For example, a client with
many objects to delete can learn that the cluster's limit on
deletes-per-request is, say, 128, and then batch up their deletes in
groups of 128. Without this, the client has to either discover the
limit out-of-band somehow (and get notified if it changes), or do some
sort of binary search to figure out the limit.
Similar reasoning applies to the containers-per-request value.
The errors-per-request values are included so that clients may size
their requests such that everything is attempted regardless of
failure.
I split the 'bulk' entry into 'bulk_delete' and 'bulk_upload' because,
from a client's standpoint, they're separate operations. It so happens
that Swift implements both in one piece of middleware, but clients
don't care.
Bonus fix: documented a missing config setting for the bulk middleware.
Change-Id: Ic3549aef79682fd5b798145c3545c1609aa1592b
The documentation rightly said to use "memcache_max_connections", but
the code was looking for "max_connections", and only looking for it in
proxy-server.conf, not in memcache.conf as a fall back.
This commit brings the code coverage for the memcache middleware to
100%.
Closes-Bug: 1252893
Change-Id: I6ea64baa2f961a09d60b977b40d5baf842449ece
Signed-off-by: Peter Portante <peter.portante@redhat.com>
The early quorum change has maybe added a little bit too much
eventual to the consistency of requests in Swift, and users can
sometimes get unexpected
results.
This change gives us a knob to turn in finding the right balance,
by adding a timeout where pending requests can finish after quorum
is achieved.
Change-Id: Ife91aaa8653e75b01313bbcf19072181739e932c
Swift can now optionally be configured to allow requests to '/info',
providing information about the swift cluster. Additionally a HMAC
signed requests to
'/info?swiftinfo_sig=<sign>&swiftinfo_expires=<expires>' can be
configured allowing privileged access to more sensitive information
not meant to be public.
DocImpact
Change-Id: I2379360fbfe3d9e9e8b25f1dc34517d199574495
Implements: blueprint capabilities
Closes-Bug: #1245694
New replication_one_per_device (True by default)
that restricts incoming REPLICATION requests to
one per device, replication_currency allowing.
Also has replication_lock_timeout (15 by default)
to control how long a request will wait to obtain
a replication device lock before giving up.
This should be very useful in that you can be
assured any concurrent REPLICATION requests are
each writing to distinct devices. If you have 100
devices on a server, you can set
replication_concurrency to 100 and be confident
that, even if 100 replication requests were
executing concurrently, they'd each be writing to
separate devices. Before, all 100 could end up
writing to the same device, bringing it to a
horrible crawl.
NOTE: This is only for ssync replication. The
current default rsync replication still has the
potentially horrible behavior.
Change-Id: I36e99a3d7e100699c76db6d3a4846514537ff685
For this commit, ssync is just a direct replacement for how
we use rsync. Assuming we switch over to ssync completely
someday and drop rsync, we will then be able to improve the
algorithms even further (removing local objects as we
successfully transfer each one rather than waiting for whole
partitions, using an index.db with hash-trees, etc., etc.)
For easier review, this commit can be thought of in distinct
parts:
1) New global_conf_callback functionality for allowing
services to perform setup code before workers, etc. are
launched. (This is then used by ssync in the object
server to create a cross-worker semaphore to restrict
concurrent incoming replication.)
2) A bit of shifting of items up from object server and
replicator to diskfile or DEFAULT conf sections for
better sharing of the same settings. conn_timeout,
node_timeout, client_timeout, network_chunk_size,
disk_chunk_size.
3) Modifications to the object server and replicator to
optionally use ssync in place of rsync. This is done in
a generic enough way that switching to FutureSync should
be easy someday.
4) The biggest part, and (at least for now) completely
optional part, are the new ssync_sender and
ssync_receiver files. Nice and isolated for easier
testing and visibility into test coverage, etc.
All the usual logging, statsd, recon, etc. instrumentation
is still there when using ssync, just as it is when using
rsync.
Beyond the essential error and exceptional condition
logging, I have not added any additional instrumentation at
this time. Unless there is something someone finds super
pressing to have added to the logging, I think such
additions would be better as separate change reviews.
FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION
CLUSTERS. Some of us will be in a limited fashion to look
for any subtle issues, tuning, etc. but generally ssync is
an experimental feature. In its current implementation it is
probably going to be a bit slower than rsync, but if all
goes according to plan it will end up much faster.
There are no comparisions yet between ssync and rsync other
than some raw virtual machine testing I've done to show it
should compete well enough once we can put it in use in the
real world.
If you Tweet, Google+, or whatever, be sure to indicate it's
experimental. It'd be best to keep it out of deployment
guides, howtos, etc. until we all figure out if we like it,
find it to be stable, etc.
Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6
1. add a comment to hint using a new account for using dispersion tools
2. change sample url for keystone from 'saio' to 'localhost'
Change-Id: I4683f5eb0af534b39112f1b7420f67d569c29b3a
- Makes swift-dispersion-populate a bit faster when using a larger
dispersion_coverage with a larger part_power.
- Adds option to only run population for container OR objects
- Adds option to let you resume population at given point (useful if you
need to resume population after a previous run error'd out or the
like) by specifying which suffix to start at.
The original populate just randomly used uuid4().hex as a suffix on the
container/object names until all the partition's required where covered.
This isn't a big deal if you're only doing 1% coverage on a ring with a
small part power but takes ages if you're doing 100% on a larger ring.
Change-Id: I52f890a774412c1d6179f12db9081aedc58b6bc2
Failed calls to rysnc can
result in very long log lines.
These lines are
mostly made up
of file paths and are
not always useful.
This change will
allow for reducing the
length of these
lines logged if desired.
Change-Id: I9a28f19eadc07757da9d42b0d7be1ed82170d732
These are headers that will be stripped unless the WSGI environment
contains a true value for 'swift_owner'. The exact definition of a
swift_owner is up to the auth system in use, but usually indicates
administrative responsibilities.
DocImpact
Change-Id: I972772fbbd235414e00130ca663428e8750cabca
The swift-dispersion-populate and swift-dispersion-report tools now
accept a --insecure option.
Also, dispersion.conf now has a keystone_api_insecure option.
Default is obviously to use the secure path.
DocImpact
Change-Id: I4000352e547d9ce5b08ade54e0c886281caff891