938 Commits

Author SHA1 Message Date
Tim Burke
d112b7d29d First pass at admin-guide cleanup
Change-Id: I005232d95a3e1d181271eef488a828ad330e6006
2017-07-07 17:33:54 +00:00
John Dickinson
4cb76a41ce docs migration from openstack-manuals
Context for this is at
https://specs.openstack.org/openstack/docs-specs/specs/pike/os-manuals-migration.html

Change-Id: I9a4da27ce1d56b6406e2db979698038488f3cf6f
2017-07-06 16:44:25 -07:00
Jenkins
37a1935198 Merge "Switch from oslosphinx to openstackdocstheme" 2017-07-06 18:16:43 +00:00
Jenkins
c3f6e82ae1 Merge "Write-affinity aware object deletion" 2017-07-06 14:00:05 +00:00
Jenkins
e94b383655 Merge "Add support to increase object ring partition power" 2017-07-05 14:40:42 +00:00
Jenkins
f1e1dbb80a Merge "Make eventlet.tpool's thread count configurable in object server" 2017-07-04 11:49:24 +00:00
Van Hung Pham
bfb5759a53 Switch from oslosphinx to openstackdocstheme
As part of the docs migration work[0] for Pike we need to switch to use
the openstackdocstheme.

Fix one display problem with wrong section markup in the index file.

[0]https://review.openstack.org/#/c/472275/

Change-Id: Ide31218a7f37ba5d959de99cab48fc6513bf426f
2017-07-02 09:27:05 +02:00
Jenkins
638fcc152b Merge "Bind SAIO services on different loopback addresses" 2017-07-01 05:17:34 +00:00
Matthew Oliver
e11a38c63a Bind SAIO services on different loopback addresses
Currently all devices in the ring and all services in a SAIO
all bind to the same loopback address 127.0.0.1. But this
breaks servers_per_port if you want to do any testing on that.

This change binds each service to a different loopback address
and updates the rings (remakerings) accordingly.

To make sure rysncd binds correctly the bind address needed
to be changed to listen on all addresses (0.0.0.0).

Change-Id: I7e77434f275df1e2699de495d8b622b90157a9d7
2017-06-27 16:08:08 -04:00
Lingxian Kong
831eb6e3ce Write-affinity aware object deletion
When deleting objects in multi-region swift delpoyment with write
affinity configured, users always get 404 when deleting object before
it's replcated to approriate nodes.

This patch adds a config item 'write_affinity_handoff_delete_count' so
that operator could define how many local handoff nodes should swift
send request to get more candidates for the final response, or by
default just leave it to swift to calculate the appropriate number.

Change-Id: Ic4ef82e4fc1a91c85bdbc6bf41705a76f16d1341
Closes-Bug: #1503161
2017-06-27 22:42:02 +12:00
Samuel Merritt
d9c4913e3b Make eventlet.tpool's thread count configurable in object server
If you're running servers_per_port > 0 and threads_per_disk = 0 (as it
should be with servers_per_port on), each object-server process will
have 20 IO threads waiting around to service eventlet.tpool
calls. This is far too many; with servers_per_port, there's no real
benefit to having so many IO threads.

This commit makes it so that, when servers_per_port > 0, each object
server defaults to having one main thread and one IO thread.

Also, eventlet's tpool size is now configurable via the object-server
config file. If a tpool size is set, that's what we'll use regardless
of servers_per_port. This allows operators with an excess of threads
to remove some regardless of servers_per_port.

Change-Id: I8f8914b7e70f2510393eb7c5e6be9708631ac027
Closes-Bug: 1554233
2017-06-23 16:16:03 +10:00
Jenkins
2d18ecdf4b Merge "Replace slowdown option with *_per_second option" 2017-06-22 01:18:26 +00:00
Jenkins
169d1d8ab8 Merge "Require that known-bad EC schemes be deprecated" 2017-06-22 01:11:03 +00:00
Ondřej Nový
a8bc94c7e3 Replace slowdown option with *_per_second option
container and object updaters sleeps "slowdown" (default 0.01) seconds
after every processed container/object. Because time.sleep call adds overhead,
use ratelimit_sleep from common.utils instead. Same as in auditor.

Change-Id: I362aa0f13c78ad03ce1f76ee0257b0646f981212
2017-06-16 19:22:00 +00:00
Tim Burke
2c3ac543f4 Require that known-bad EC schemes be deprecated
We said we were going to do it, we've had two releases saying we'd do
it, we've even backported our saying it to Newton -- let's actually do
it.

Upgrade Consideration
=====================

Erasure-coded storage policies using isa_l_rs_vand and nparity >= 5 must
be configured as deprecated, preventing any new containers from being
created with such a policy. This configuration is known to harm data
durability. Any data in such policies should be migrated to a new
policy. See https://bugs.launchpad.net/swift/+bug/1639691 for more
information.

UpgradeImpact
Related-Change: I50159c9d19f2385d5f60112e9aaefa1a68098313
Change-Id: I8f9de0bec01032d9d9b58848e2a76ac92e65ab09
Closes-Bug: 1639691
2017-06-16 17:58:43 +00:00
Christian Schwede
e1140666d6 Add support to increase object ring partition power
This patch adds methods to increase the partition power of an existing
object ring without downtime for the users using a 3-step process. Data
won't be moved to other nodes; objects using the new increased partition
power will be located on the same device and are hardlinked to avoid
data movement.

1. A new setting "next_part_power" will be added to the rings, and once
the proxy server reloaded the rings it will send this value to the
object servers on any write operation. Object servers will now create a
hard-link in the new location to the original DiskFile object. Already
existing data will be relinked using a new tool in the new locations
using hardlinks.

2. The actual partition power itself will be increased. Servers will now
use the new partition power to read from and write to. No longer
required hard links in the old object location have to be removed now by
the relinker tool; the relinker tool reads the next_part_power setting
to find object locations that need to be cleaned up.

3. The "next_part_power" flag will be removed.

This mostly implements the spec in [1]; however it's not using an
"epoch" as described there. The idea of the epoch was to store data
using different partition powers in their own namespace to avoid
conflicts with auditors and replicators as well as being able to abort
such an operation and just remove the new tree.  This would require some
heavy change of the on-disk data layout, and other object-server
implementations would be required to adopt this scheme too.

Instead the object-replicator is now aware that there is a partition
power increase in progress and will skip replication of data in that
storage policy; the relinker tool should be simply run and afterwards
the partition power will be increased. This shouldn't take that much
time (it's only walking the filesystem and hardlinking); impact should
be low therefore. The relinker should be run on all storage nodes at the
same time in parallel to decrease the required time (though this is not
mandatory). Failures during relinking should not affect cluster
operations - relinking can be even aborted manually and restarted later.

Auditors are not quarantining objects written to a path with a different
partition power and therefore working as before (though they are reading
each object twice in the worst case before the no longer needed hard
links are removed).

Co-Authored-By: Alistair Coles <alistair.coles@hpe.com>
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>

[1] https://specs.openstack.org/openstack/swift-specs/specs/in_progress/
increasing_partition_power.html

Change-Id: I7d6371a04f5c1c4adbb8733a71f3c177ee5448bb
2017-06-15 15:08:48 -07:00
Jenkins
73da215bdf Merge "Ring doc cleanups" 2017-06-14 02:54:03 +00:00
Jenkins
4315093a28 Merge "More Global EC doc updates" 2017-06-13 21:13:07 +00:00
Tim Burke
b5ee8c88d0 Ring doc cleanups
Change-Id: Ie51ea5c729341da793887e1e25c1e45301a96751
2017-06-13 09:23:23 -07:00
Clay Gerrard
4c7839d256 More Global EC doc updates
Soften the language about inefficiency on read and strengthen the
language encouraging the use of read affinity and composite rings.

Change-Id: Idc81a8c71e74ae28d384759700c5268d77ae3c85
2017-06-13 10:08:20 +01:00
Jenkins
41c8f1330f Merge "Update Global EC docs with reference to composite rings" 2017-06-13 06:26:54 +00:00
Alistair Coles
9665252352 Update Global EC docs with reference to composite rings
* In light of the composite rings feature being added [1],
  downgrade the warnings about EC Duplication [2] being
  experimental.

* Add links from Global EC docs to composite rings and
  per-policy proxy config features.

* Add discussion of using EC duplication with composite
  rings.

* Update Known Issues.

[1] Related-Change: I0d8928b55020592f8e75321d1f7678688301d797
[2] Related-Change: Idd155401982a2c48110c30b480966a863f6bd305

Change-Id: Id97a4899255945a6eaeacfef12fd29a2580588df
2017-06-12 16:58:02 -07:00
Jenkins
db45e1dd69 Merge "Add structure to storage policy configuration guide" 2017-05-31 21:36:13 +00:00
Alistair Coles
37ba21face Add structure to storage policy configuration guide
The description of storage policy config options was
unstructured and repetitive. This patch attempts to
improve the doc by gathering the notes for each option
into a structured list.

Change-Id: I57090b35a70f365e82fb0e29ab42e533d6359a7b
2017-05-31 11:11:32 +01:00
Jenkins
b9322a2f08 Merge "Add link from policies overview to per-policy proxy-server conf" 2017-05-30 19:13:56 +00:00
Alistair Coles
227cef9933 Add link from policies overview to per-policy proxy-server conf
- add proxy server per policy config as an optional
  step in the configuration of a policy, with link to
  the deployment guide

- add reverse link from deployment guide per-policy
  config doc section to storage policies docs

Drive-by fix an incorrect test comment

Change-Id: Ib95310193270a63c9d1e321c6e7de240e00b387f
Related-Change: I3f718f425f525baa80045ba067950c752bcaaefc
2017-05-26 10:41:35 +01:00
Tim Burke
d487bf7fb1 Remove tempauth docs from deployment guide
Instead, link to the middleware list and auth overview, as well as
referring readers to proxy-server.conf-sample

TempAuth-related content that was previously in the deployment guide has
been moved to TempAuth's own docs, which have been cleaned up a bit.

Change-Id: I00070bb09294362c069f7ee9426ac570bc1b3ddb
2017-05-25 12:35:46 -07:00
Jenkins
263dc8a3f3 Merge "Enable per policy proxy config options" 2017-05-25 06:34:48 +00:00
Alistair Coles
45884c1102 Enable per policy proxy config options
This is an alternative approach to that proposed in [1]

Adds support for optional per-policy config sections
to be added in proxy-server.conf. This is highly desirable
to allow per-policy affinity options to be set for use with
duplicated EC policies [2] and composite rings [3].

Certain options found in per-policy conf sections will
override their equivalents that may be set in the
[app:proxy-server] section. Currently the options
handled that way are:

  sorting_method
  read_affinity
  write_affinity
  write_affinity_node_count

For example:

  [proxy-server:policy:0]
  sorting_method = affinity
  read_affinity = r1=100
  write_affinity = r1
  write_affinity_node_count = 1 * replicas

The corresponding attributes of the proxy-server Application
are now available from instances of an OverrideConf object
that is obtained from Application.get_policy_options(policy).

[1] Related-Change: I9104fc789ba85ab3ab5ccd34096125b482821389
[2] Related-Change: Idd155401982a2c48110c30b480966a863f6bd305
[3] Related-Change: I0d8928b55020592f8e75321d1f7678688301d797

Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: I3f718f425f525baa80045ba067950c752bcaaefc
2017-05-23 20:22:30 +01:00
Jenkins
a2e020c52b Merge "Add read and write affinity options to deployment guide" 2017-05-19 19:05:33 +00:00
Alistair Coles
f02ec4de81 Add read and write affinity options to deployment guide
Add entries for these options in the deployment guide and
make the text in proxy-server.conf-sample and man page
consistent.

Change-Id: I5854ddb3e5864ddbeaf9ac2c930bfafdb47517c3
2017-05-18 10:42:44 -07:00
Jenkins
9089e44c0b Merge "Add Composite Ring Functionality" 2017-05-18 10:18:31 +00:00
Kota Tsuyuzaki
d40031b46f Add Composite Ring Functionality
* Adds a composite_builder module which provides the functionality to
  build a composite ring from a number of component ring builders.

* Add id to RingBuilder to differentiate rings in composite.
  A RingBuilder now gets a UUID when it is saved to file if
  it does not already have one. A RingBuilder loaded from
  file does NOT get a UUID assigned unless it was previously persisted in
  the file. This forces users to explicitly assign an id to
  existing ring builders by saving the state back to file.

  The UUID is included in first line of the output from:

    swift-ring-builder <builder-file>

Background:

This is another implementation for Composite Ring [1]
to enable better dispersion for global erasure coded cluster.

The most significant difference from the related-change [1] is that this
solution attempts to solve the problem as an offline tool rather than
dynamic compositing on the running servers. Due to the change, we gain
advantages such as:

- Less code and being simple
- No complex state validation on the running server
- Easy deployments with an offline tool

This patch does not provide a command line utility for managing
composite rings. The interface for such a tool is still under
discussion; this patch provides the enabling functionality first.

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>

[1] Related-Change: I80ef36d3ac4d4b7c97a1d034b7fc8e0dc2214d16
Change-Id: I0d8928b55020592f8e75321d1f7678688301d797
2017-05-15 16:42:00 -07:00
Tim Burke
3981e1ee8a Remove links for EOLed releases
Change-Id: If7e8526edf18b02474ba272451b9b4212558e03c
2017-05-11 13:57:10 +00:00
Clay Gerrard
37f6f25283 Update multi-node install links
... as is useful to do from time to time

Change-Id: I165899445080fa3a8e6dc624ab5a13680b819a73
2017-05-11 09:13:02 -04:00
Ngo Quoc Cuong
e23f8d3160 Trivial fix typo while reading doc
Change-Id: I9d96dd4464a086e508fbf18057b4f3e90c82c916
2017-05-05 23:06:14 +07:00
Tim Burke
cce719260d Clean up some doc formatting
Change-Id: Iac24369910464cb766fe7d5e6c15120d147930a7
2017-03-29 21:00:01 +00:00
lijunbo
47ba1041fc Use swift tempurl instaed of swift-temp-url
Deprecate swift-temp-url and call python-swiftclient's
implementation instead. This adds python-swiftclient as an
optional dependency of Swift which is noted in releasenotes.

Change-Id: I0404f16c21099cb7695430f5b63722729c305613
2017-03-27 18:38:50 +08:00
Jenkins
91fc844e7b Merge "Document SAIO rsync service setup for ubuntu 16" 2017-03-24 16:12:00 +00:00
lijunbo
21396bc106 keep consistent naming convention of swift and urls
Change-Id: Iddd4f69abf77a5c643ce8b164fc6cfd72c068229
2017-03-23 02:28:41 +00:00
Jenkins
1e9b8888bf Merge "Enable cluster-wide CORS Expose-Headers setting" 2017-03-13 19:24:20 +00:00
Jenkins
a3efca5027 Merge "Support EC policy for in process functional tests" 2017-03-10 04:38:12 +00:00
Alistair Coles
5f610c76bd Support EC policy for in process functional tests
Add support for a 2+1 EC policy to be optionally used as default
policy when running in process functional tests.

The EC policy may be selected by setting the env var:

  SWIFT_TEST_IN_PROCESS_CONF_LOADER=ec tox

when running .functests, or by using the new tox test env:

  tox -e func-ec

Change-Id: I02e3553a74a024efdab91dcd609ac1cf4e4f3208
2017-03-09 10:42:34 +00:00
Alistair Coles
adcb4c270e Document SAIO rsync service setup for ubuntu 16
SAIO docs do suggest using Ubuntu 14.04, but if using
16.04 then systemctl needs to be used to have rsync service
restart on reboot.

Change-Id: I4fb0d3d063df61fbdfca981f06911148f3c4dc04
2017-03-08 15:07:13 +00:00
Clay Gerrard
38b99ad195 Global EC Under Development Documentation
Layout the foundation for documenting the features which will enable
Global EC.

The formatting on the sections in our existing EC docs didn't follow
best practices [1] and it caused some sphinx build warnings.

1. http://www.sphinx-doc.org/en/stable/rest.html#sections

Change-Id: I2d164dafeb84629c75c3c2ff774329ee84270b7f
2017-03-07 15:25:54 +00:00
Tim Burke
2ca303597e Make Sphinx treat warnings as errors
...and fix up the one warning that's crept in.

Change-Id: I3985d027f0ac2119ceaeb4daba5964f937de6cea
2017-03-06 23:55:40 +00:00
Jenkins
1f36b5dd16 Merge "EC Fragment Duplication - Foundational Global EC Cluster Support" 2017-02-26 06:26:08 +00:00
Romain LE DISEZ
9b47de3095 Enable cluster-wide CORS Expose-Headers setting
An operator proposing a web UX to its customers might want to allow web
browser to access some headers by default (eg: X-Storage-Policy,
 X-Container-Read, ...). This commit adds a new setting to the
proxy-server to allow some headers to be added cluster-wide to the CORS
header Access-Control-Expose-Headers.

Change-Id: I5ca90a052f27c98a514a96ee2299bfa1b6d46334
2017-02-25 19:00:28 +01:00
Kota Tsuyuzaki
40ba7f6172 EC Fragment Duplication - Foundational Global EC Cluster Support
This patch enables efficent PUT/GET for global distributed cluster[1].

Problem:
Erasure coding has the capability to decrease the amout of actual stored
data less then replicated model. For example, ec_k=6, ec_m=3 parameter
can be 1.5x of the original data which is smaller than 3x replicated.
However, unlike replication, erasure coding requires availability of at
least some ec_k fragments of the total ec_k + ec_m fragments to service
read (e.g. 6 of 9 in the case above). As such, if we stored the
EC object into a swift cluster on 2 geographically distributed data
centers which have the same volume of disks, it is likely the fragments
will be stored evenly (about 4 and 5) so we still need to access a
faraway data center to decode the original object. In addition, if one
of the data centers was lost in a disaster, the stored objects will be
lost forever, and we have to cry a lot. To ensure highly durable
storage, you would think of making *more* parity fragments (e.g.
ec_k=6, ec_m=10), unfortunately this causes *significant* performance
degradation due to the cost of mathmetical caluculation for erasure
coding encode/decode.

How this resolves the problem:
EC Fragment Duplication extends on the initial solution to add *more*
fragments from which to rebuild an object similar to the solution
described above. The difference is making *copies* of encoded fragments.
With experimental results[1][2], employing small ec_k and ec_m shows
enough performance to store/retrieve objects.

On PUT:

- Encode incomming object with small ec_k and ec_m  <- faster!
- Make duplicated copies of the encoded fragments. The # of copies
  are determined by 'ec_duplication_factor' in swift.conf
- Store all fragments in Swift Global EC Cluster

The duplicated fragments increase pressure on existing requirements
when decoding objects in service to a read request.  All fragments are
stored with their X-Object-Sysmeta-Ec-Frag-Index.  In this change, the
X-Object-Sysmeta-Ec-Frag-Index represents the actual fragment index
encoded by PyECLib, there *will* be duplicates.  Anytime we must decode
the original object data, we must only consider the ec_k fragments as
unique according to their X-Object-Sysmeta-Ec-Frag-Index.  On decode no
duplicate X-Object-Sysmeta-Ec-Frag-Index may be used when decoding an
object, duplicate X-Object-Sysmeta-Ec-Frag-Index should be expected and
avoided if possible.

On GET:

This patch inclues following changes:
- Change GET Path to sort primary nodes grouping as subsets, so that
  each subset will includes unique fragments
- Change Reconstructor to be more aware of possibly duplicate fragments

For example, with this change, a policy could be configured such that

swift.conf:
ec_num_data_fragments = 2
ec_num_parity_fragments = 1
ec_duplication_factor = 2
(object ring must have 6 replicas)

At Object-Server:
node index (from object ring):  0 1 2 3 4 5 <- keep node index for
                                               reconstruct decision
X-Object-Sysmeta-Ec-Frag-Index: 0 1 2 0 1 2 <- each object keeps actual
                                               fragment index for
                                               backend (PyEClib)

Additional improvements to Global EC Cluster Support will require
features such as Composite Rings, and more efficient fragment
rebalance/reconstruction.

1: http://goo.gl/IYiNPk (Swift Design Spec Repository)
2: http://goo.gl/frgj6w (Slide Share for OpenStack Summit Tokyo)

Doc-Impact

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: Idd155401982a2c48110c30b480966a863f6bd305
2017-02-22 10:56:13 -08:00
Alistair Coles
9be1d8ba28 Fix tox -e docs sphinx errors
Change-Id: I6e200558b75ac539b59b492d13c36702443efc89
2017-02-20 15:58:35 -05:00