Swift doesn't check if the used API version is valid. Currently there
is only one valid REST API version, but that might change in the
future.
This patch enforces "v1" or "v1.0" as the version string when accessing
account, containers and objects.
The list of accepted version strings can be manually overridden using a
comma-separated list in swift.conf to make this backward-compatible.
The constraint loader has been modified slightly to accept strings as
well as integers.
Any request to an account, container, and object which does not
provide the correct version string will get a 400 BadRequest response.
The allowed api versions are by default excluded from /info.
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: John Dickinson <me@not.mn>
Closes Bug #1437442
Change-Id: I5ab6e236544378abf2eab562ab759513d09bc256
This patch adds the erasure code reconstructor. It follows the
design of the replicator but:
- There is no notion of update() or update_deleted().
- There is a single job processor
- Jobs are processed partition by partition.
- At the end of processing a rebalanced or handoff partition, the
reconstructor will remove successfully reverted objects if any.
And various ssync changes such as the addition of reconstruct_fa()
function called from ssync_sender which performs the actual
reconstruction while sending the object to the receiver
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
blueprint ec-reconstructor
Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51
This patch changes container sync to use Internal Client instead
of Direct Client.
In the current design, container sync uses direct_get_object to
get the newest source object(which talks to storage node directly).
This works fine for replication storage policies however in
erasure coding policies, direct_get_object would only return part
of the object(it's encoded as several pieces). Using Internal
Client can get the original object in EC case.
Note that for the container sync put/delete part, it's working in
EC since it's using Simple Client.
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
DocImpact
Change-Id: I91952bc9337f354ce6024bf8392046a1ecf6ecc9
This patch extends the StoragePolicy class for non-replication storage
policies, the first one being "erasure coding".
Changes:
- Add 'policy_type' support to BaseStoragePolicy class
- Disallow direct instantiation of BaseStoragePolicy class
- Subclass BaseStoragePolicy
- "StoragePolicy":
. Replication policy, default
. policy_type = 'replication'
- "ECStoragePolicy":
. Erasure Coding policy
. policy_type = 'erasure_coding'
. Private member variables
ec_type (EC backend),
ec_num_data_fragments (number of fragments original
data split into after erasure coding operation),
ec_num_parity_fragments (number of parity fragments
generated during erasure coding)
. Private methods
EC specific attributes and ring validator methods.
- Swift will use PyECLib, a Python Erasure Coding library, for
erasure coding operations. PyECLib is already an approved
OpenStack core requirement.
(https://bitbucket.org/kmgreen2/pyeclib/)
- Add test cases for
- 'policy_type' StoragePolicy member
- policy_type == 'erasure_coding'
DocImpact
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: Ie0e09796e3ec45d3e656fb7540d0e5a5709b8386
Implements: blueprint ec-proxy-work
Container sync might get stuck without a connection timeout if the remote proxy
is not responding.
This patch sets a default timeout of 5.0 seconds for the connection attempt. The
value is much higher than other connection timeouts inside Swift (0.5); however
there might be a much higher latency to the remote peer, thus playing it safe.
There is also a retry if the attempt timed out.
Note that this setting only applies to the connection request itself. Setting
this timeout does not apply when the remote proxy goes away during a request.
Also added a short test to ensure urlopen is called with the timeout value.
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Change-Id: Ic08a55157fa91fe1316653781adf4d66eead61bc
Partial-Bug: 1419916
From rsync's man page:
-z, --compress
With this option, rsync compresses the file data as it is sent to the
destination machine, which reduces the amount of data being transmitted --
something that is useful over a slow connection.
A configurable option has been added to allow rsync to compress, but only
if the remote node is in a different region than the local one.
NOTE: Objects that are already compressed (for example: .tar.gz, .mp3)
might slow down the syncing process.
On wire compression can also be extended to ssync later in a different
change if required. In case of ssync, we could explore faster
compression libraries like lz4. rsync uses zlib which is slow but offers
higher compression ratio.
Change-Id: Ic9b9cbff9b5e68bef8257b522cc352fc3544db3c
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Updated proxy-server.conf-sample with the correct default. Also
updated the note on the overview-auth doc page.
Change-Id: I5cd62a7a118a28f7b58f47b8d8d4d963f6bc7347
I1f8f5064ea8028af60f167df9b97e215cdadba44 deprecated auth_host etc but the default
config still used them. Ieac26806bd420aa08fc79bbc6a11eb6a1c15c7df then switched
devstack to using the new variables, but if the old variables still existed in the
default config, some installations were broken (e.g. XenServer CI)
Partial-bug: 1415795
Change-Id: I7076fa03ab531cbb1114918f75113620b65590dc
More memcache options can be set in the memcache.conf or proxy-server.conf
* connect_timeout
* pool_timeout
* tries
* io_timeout
Options set in proxy-server.conf are considered more specific to the memcache
middleware.
DocImpact
Change-Id: I194d0f4d88c6cb8c797a37dcab48f2d8473e7a4e
The way we do this now involves a conf change and a proxy
reload which is a pain. You can now just set these:
X-Account-Sysmeta-Global-Write-Ratelimit: WHITELIST
or
X-Account-Sysmeta-Global-Write-Ratelimit: BLACKLIST
NOTE:
The existing proxy config settings: account_whitelist
and account_blacklist will continue to work.
Change-Id: I532663f1d2c75d03170c5fdb9b330416822fbc88
Adds is_admin and allow_overrides to the keystoneauth section
of proxy-server.conf.sample and also adds related comments to
the keystoneauth docstring.
DocImpact
Change-Id: I7c751880cb6742db7347f31c4d32b523e33da75b
This patch adds console logging ability to swift-drive-audit.
There are cases where logging to console is necessary when drive-audit
is done. This can be consumed for flagging errors in monitoring tools
such as icinga.
DocImpact
Change-Id: Ia1e1effcbd89bd2cf6d5b8c64019f1647c736a3a
This patch was first motivated by noticing that the proxy
server pipeline used for in process functional tests was
out of date with respect to the pipeline in
/etc/proxy-server.conf.sample. Rather than cut and paste
the current pipeline into the in process setup, it seems
like a better idea would be to have the in process tests
always use the sample config.
A further benefit is that in process functional tests will
pick up changes to the sample config introduced by patches -
previously test/functional/__init__.py would need to be
manually modified to run in process functional tests
on new middleware for example.
Note: because the pipeline is now loaded using entry points,
'python setup.py [develop|install]' will now be needed
before running the tests.
Obvious next steps would be to do the same for the backend
servers, and to allow alternative config files and dir's
to be specified, but this patch is the first step.
Also drive-by fixes some typos in proxy-server.conf.sample
Change-Id: If442bd7c2b1721ec92839c4490924ba33e1545d8
If you try an unauthorized upload into a container that is over quota you get
a 403 instead of a 413, but if you try to unauthorized upload when an
*account* is over quota you can see the 413 even though the upload would have
been rejected by the authorize callback. By wrapping the authorize callback
associated with the incoming request we can make sure to only return our 413
when the request would have been authorized otherwise.
Drive by doc fixes thanks to acoles:
* State that container_quotas should be after auth middleware in
the class doc string.
* Add note to proxy-server.conf.sample that account_quotas should
be after auth middleware.
The equivalent statements are already in place for each quota
middleware.
Doc-Impact
Closes-Bug: #1387415
Change-Id: I2a88b3ec79d35bfdd73ea6ad64e376b7c7af4ea6
This patch adds two new features to swift-drive-audit. The first
is an option in the drive-audit.conf file that allows the operator
to prevent the drives ever being unmounted automatically,
regardless of the amount of errors present. This could be of
benefit in very small systems consisting of only one or two drives
where the operator would like to manually unmount/fix the
particular drive(s) and minimise any potential downtime.
The second is another option in drive-audit.conf that allows the
operator to select a recon directory. This directory will then
have a drive.recon file which will keep an up-to-date record of
the swift drives and any errors associated with them. An example
of the output would be as follows:
{"/srv/node/disk2": "0", "/srv/node/disk3": "25", "/srv/node/disk0": "0",
"/srv/node/disk1": "0", "/srv/node/disk10": "0", "/srv/node/disk7": "0",
"/srv/node/disk4": "137", "/srv/node/disk5": "0", "/srv/node/disk8": "0",
"/srv/node/disk9": "0", "/srv/node/disk6": "0", "/srv/node/disk11": "60"}
This would allow the operator to monitor the errors on the swift
drives without having to spend time searching through logs. Also, if
this is accepted, it should be possible to add an option to
swift-recon that would keep track of this at a system level.
Change-Id: Ib5dacf8622b7363e070c274c7c30c8ead448a055
This commit lets the object server use splice() and tee() to move data
from disk to the network without ever copying it into user space.
Requires Linux. Sorry, FreeBSD folks. You still have the old
mechanism, as does anyone who doesn't want to use splice. This
requires a relatively recent kernel (2.6.38+) to work, which includes
the two most recent Ubuntu LTS releases (Precise and Trusty) as well
as RHEL 7. However, it excludes Lucid and RHEL 6. On those systems,
setting "splice = on" will result in warnings in the logs but no
actual use of splice.
Note that this only applies to GET responses without Range headers. It
can easily be extended to single-range GET requests, but this commit
leaves that for future work. Same goes for PUT requests, or at least
non-chunked ones.
On some real hardware I had laying around (not a VM), this produced a
37% reduction in CPU usage for GETs made directly to the object
server. Measurements were done by looking at /proc/<pid>/stat,
specifically the utime and stime fields (user and kernel CPU jiffies,
respectively).
Note: There is a Python module called "splicetee" available on PyPi,
but it's licensed under the GPL, so it cannot easily be added to
OpenStack's requirements. That's why this patch uses ctypes instead.
Also fixed a long-standing annoyance in FakeLogger:
>>> fake_logger.warn('stuff')
>>> fake_logger.get_lines_for_level('warn')
[]
>>>
This, of course, is because the correct log level is 'warning'. Now
you get a KeyError if you call get_lines_for_level with a bogus log
level.
Change-Id: Ic6d6b833a5b04ca2019be94b1b90d941929d21c8
In a long-term effort to change the recommended ports for Swift,
the first step is to require the bind_port in config files. Later,
we can change the recommended setting.
Anyone currently explicitly setting the ports will not be affected.
Anyone not setting the ports will need to specify them to match their
rings.
DocImpact
Change-Id: Icca83a263acdd0afc9016424a3e9f8c15e944789
The keystoneauth middleware supports cross-tenant access
control using the syntax <tenant>:<user> in container ACLs,
where <tenant> and <user> may currently be either a unique
id or a name. As a result of the keystone v3 API introducing
domains, names are no longer globally unique and are only
unique within a domain. The use of unqualified tenant and
user names in this ACL syntax is therefore not 'safe' in a
keystone v3 environment.
This patch modifies keystoneauth to restrict cross-tenant
ACL matching to use only ids for accounts that are not in
the default domain. For backwards compatibility,
names will still be matched in ACLs when both the requesting
user and tenant are known to be in the default domain AND the
account's tenant is also in the default domain (the default
domain being the domain to which existing tenants are
migrated).
Accounts existing prior to this patch are assumed to be for
tenants in the default domain. New accounts created using a
v2 token scoped on the tenant are also assumed to be in the
default domain. New accounts created using a v3 token scoped
on the tenant will learn their domain membership from the
token info. New accounts created using any unscoped token,
(i.e. with a reselleradmin role) will have unknown domain
membership and therefore be assumed to NOT be in the default
domain.
Despite this provision for backwards compatibility, names
must no longer be used when setting new ACLs in any account,
including new accounts in the default domain.
This change obviously impacts users accustomed to specifying
cross-tenant ACLs in terms of names, and further work will be
necessary to restore those use cases. Some ideas are
discussed under the bug report. With that caveat, this patch
removes the reported vulnerability when using
swift/keystoneauth with a keystone v3 API.
Note: to observe the new 'restricted' behaviour you will need
to setup keystone user(s) and tenant(s) in a non-default domain
and set auth_version = v3.0 in the auth_token middleware config
section of proxy-server.conf. You may also benefit from the
keystone v3 enabled swiftclient patch under review here:
https://review.openstack.org/#/c/91788/
DocImpact
blueprint keystone-v3-support
Closes-Bug: #1299146
Change-Id: Ib32df093f7450f704127da77ff06b595f57615cb
The tempurl middleware supports any configured HTTP methods, but the
default set was only GET, PUT, and HEAD, so cluster operators had to
take action to enable POST and DELETE. This commit changes the
defaults to include POST and DELETE.
Note that this doesn't affect any existing temporary URLs at all; the
method is baked into the signature (temp_url_sig query param), so no
new access is granted to a holder of a temporary URL by this
change. It simply gives more flexibility to creators of temporary
URLs.
Change-Id: I5bc15bbd2968ab7bedcd7c0df10f2ec825537191
This change is the result of an audit through the config parameters
provided by swift and how/if they are addressed in the swift
documentation. The documentation being the sample config files in
the /etc directory or the documentation.
This change is only concerned with the config files in etc/ next
I will look at the documentation in the doc/ folder.
This change makes the following assumptions:
- Unless stated otherwise, the commented out parameter in the
sample configuration is the default for swift.
- When the default in the code differs from that of the sample
configuration, the default in the code is correct.
Container reconciler:
Parameter: interval
- code: 30
- config: 300
Result: config = 30
Object Expirer:
Parameter: recon_cache_path
- code: /var/cache/swift
- config: Parameter missing
Result: Add parameter
swift-dispersion-populate && swift-dispersion-report
Parameter: auth_version
- code: 1.0
- config: 2.0 (due to being a confusing example of how to setup
version 2.0).
Result: Added 'auth_version = 1.0' to the right section (showing
default and make the sample configuration for auth version
2.0 easier to understand.
swift-drive-audit:
Parameter: log_file_pattern
- code: /var/log/kern.*[!.][!g][!z]
- config: /var/log/kern*
Result: config = /var/log/kern.*[!.][!g][!z]
NOTE: swift-drive-audit uses a parameter called device_dir which
defaults to '/srv/node'. In all other swift binaries/services
there is a similar parameter called devices which stores the
same thing. This is an inconsistency which I haven't fixed
as this could break existing swift clusters out in the wild.
Proxy Server:
Parameter: object_chunk_size
- code: 65536
- config: 8192
Result: config = 65536
Parameter: client_chunk_size
- code: 65536
- config: 8192
Result: config = 65536
Parameter: strict_cors_mode
- code: True
- config: No parameter
Result: config = True
Account and Container replicator configuration confusion:
NOTES:
The account and container replicators have parameters:
- interval
- run_pause
Both of these are loaded into the same variable in code:
self.interval = int(conf.get('interval') or
conf.get('run_pause') or 30)
If a user sets both to different values then interval is used.
Result: Update the configuration to make this more clear.
DocImpact
Change-Id: Iaadbb1a6284f8b3e0801bc343b29772f70f4bf6e
It's generally better to have logs for something than to not have
logs. This way, the object expirer (if using the sample config as a
starting point) will log what it does.
Note that the container reconciler's sample config already contains
proxy-logging, as does the proxy server's. The object expirer is the
odd man out.
Change-Id: I32aac99131746501820319b94405440c1934a694
auth_token middleware in python-keystoneclient is deprecated and has
been moved to the keystonemiddleware repo.
Change-Id: Ia04aa83348e0776cb3239cb5420ee1450a990d5b
Closes-Bug: #1342274
May not be obvious, but existing code will let you change the
disk_chunk_size just for the auditor so this just points that
out in the docs. In one short test I ran with a 4 node cluster
with 18GB of 4MB objects on it, changint he auditor chunk size
from the default of 64K to 1MB creased the auditor CPU time from
10% to 4%.
Also added test code to make sure this overridden value is
actually used and checked other auditWorker conf values as
well.
Change-Id: Ia12e1c6127877dc2124b60cd963cd0b6d5f3d6ef
We are soon going to put servers with a high ratio of disk to CPU
into production as object servers. One of our concerns with this
configuration is that the object auditor would take too long to
complete its audit cycle. Therefore we decided to parallelise
the auditor.
The auditor already uses fork(), so we decided to use the parallel
model from the replicator. Concurrency is set by the concurrency
parameter in the auditor stanza, which sets the number of parallel
checksum auditors. The actual number of parallel auditing processes
is concurrency + 1 if zero_byte_fps is non-zero.
Only one ZBF process is forked, and a new ZBF process is forked as
soon as the current ZBF process finishes. Thus the last process
running will always be a ZBF process.
Both forever and once modes are parallelised.
Each checksum auditor process submits a nested dictionary with keys
{'object_auditor_stats_ALL': {'diskn': {..}}} to dump_recon_cache
so that the object_auditor_stats_ALL dict in recon cache consists
of individual sub-dicts for each of the object disks on the server.
The recon cache is no different to before when the checksum auditor
is run in serial mode. When swift-recon is run, it sums the stats
for the individual disks.
DocImpact
Change-Id: I0ce3db57a43e482d4be351cc522fc9060af6e2d3
Currently if the object-expirer goes to delete an object and the primary nodes
are unavailable, or the object is on handoffs - the object servers are unable
to verify the x-if-delete-at timestamp and return 412, without writing a
tombstone or updating the containers. The expirer treats 412 as success and
the dark data is not removed form the object servers nor the object removed in
the listing.
As a side effect of this bug, if the expirer encounters split brain the delete
would never get processed in the correct storage policy.
It seems it's just not correct to treat the lack of data as success. Now the
object server will treat x-if-delete at against a non-existent object as a
404, and to distinguish from a successfull process of an x-if-delete-at
request, will return 204.
The expirer will treat a 404 response from swift as a failure, and will
continue to attempt to expire the object until it is older that it's
configurable reclaim age. However swift will only return 404 if the majority
of nodes are able to return success, or if only even a single node is able to
accept the x-if-delete-at request the containers will get updated and
replicaiton will settle the tombstone - the subsequent x-if-delete-at request
will 412 and be removed from the queue.
It's worth noting that if an object with x-delete-at meta is DELETED (by a
client request) an async update for the expiring update containers will be
processed to remove the queue entry - but if no primary nodes handle the
DELETE request replication will never remove the expiring entry and assuming
it's scheduled for beyond the tombstones reclaim age - the queue entry will
not be processable. In this case the expirer will attempt to DELETE the
object (and get 404s) in vain until the queue entry passes the configurable
reclaim age.
DocImpact
Implements: blueprint storage-policies
Change-Id: I66260e99fda37e97d6d2470971b6f811ee9e01be
Have container sync get its object ring from POLICIES now,
update tests to use policy index from container_info and pass
that along for use in ring selection.
This change also introduced the option of specifiying in the cluster info
which of the relam/cluster's is the current realm/cluster.
DocImpact
Implements: blueprint storage-policies
Change-Id: If57d3b0ff8c395f21c81fda76458bc34fcb23257
This daemon will take objects that are in the wrong storage policy and
move them to the right ones, or delete requests that went to the wrong
storage policy and apply them to the right ones. It operates on a
queue similar to the object-expirer's queue.
Discovering that the object is in the wrong policy will be done in
subsequent commits by the container replicator; this is the daemon
that handles them once they happen.
Like the object expirer, you only need to run one of these per cluster
see etc/container-reconciler.conf.
DocImpact
Implements: blueprint storage-policies
Change-Id: I5ea62eb77ddcbc7cfebf903429f2ee4c098771c9
The basic idea here is to replace the use of a single object ring in
the Application class with a collection of object rings. The
collection includes not only the Ring object itself but the policy
name associated with it, the filename for the .gz and any other
metadata associated with the policy that may be needed. When
containers are created, a policy (thus a specific obj ring) is
selected allowing apps to specify policy at container creation time
and leverage policies simply by using different containers for object
operations.
The policy collection is based off of info in the swift.conf file.
The format of the sections in the .conf file is as follows:
swift.conf format:
[storage-policy:0]
name = chicken
[storage-policy:1]
name = turkey
default = yes
With the above format:
- Policy 0 will always be used for access to existing containers
without the policy specified. The ring name for policy 0 is always
'object', assuring backwards compatiblity. The parser will always
create a policy 0 even if not specified
- The policy with 'default=yes' is the one used for new container
creation. This allows the admin to specify which policy is used without
forcing the application to add the metadata.
This commit simply introduces storage policies and the loading
thereof; nobody's using it yet. That will follow in subsequent
commits.
Expose storage policies in /info
DocImpact
Implements: blueprint storage-policies
Change-Id: Ica05f41ecf3adb3648cc9182f11f1c8c5c678985
Just put SLO and DLO after any auth middleware. This works because when
the request goes through that middleware in the pipeline the
authentication takes place: validation of the token, setting up who the
user is, and setting the authorization call back. Each subrequest made
for the segments will be subjected to that authorization call back which
verifies the user has access to the individual segments.
To get this to work with keystone, the keystone identity is set up
during __call__ and applied to the authorize function using a
functools.partial. When the authorize function is later called from the
environ by the proxy server the idenity that was set up when the request
passed through the auth middleware is used, not what can be pulled out
of the possibly altered state of the request's environment.
DocImpact
fixes bug: 1315133
Change-Id: I7827dd2d9dfbb3c6424773fb2891355d47e372ba
Log lines can get quite large, as we previously noticed with rsync error
log lines. We added a setting to cap those, but it really looks like we
should have just done this overall limit. We noticed the issue when we
switched to UDP syslogging and it would occasionally blow past the 16436
lo MTU! This causes Python's logging code to get an error and hilarity
ensues.
Change-Id: I44bdbe68babd58da58c14360379e8fef8a6b75f7
Based on comments from deployers at the Juno OpenStack summit,
limiting the default logged token length (to, by default, prevent
tokens from being fully logged) is a good idea.
Change-Id: I58980e85329d99de41f1c08f75e85973452317b1
The profile middleware provide a tool to profile Swift
code on the fly and collect statistic data for performance
analysis. An native simple Web UI is also provided to help
query and visualize the data.
Change-Id: I6a1554b2f8dc22e9c8cd20cff6743513eb9acc05
Implements: blueprint profiling-middleware
Add test to check that only the expected keys are
reported by proxy in /info, and add comments to
raise awareness that default constraints will be
automatically published by proxy in response to /info
requests.
Change-Id: Ia5f6339b06cdc2e1dc960d1f75562a2505530202