A new configuration parameter is added to /etc/swift/swift.conf
[swift-hash]
swift_hash_path_prefix = 'random unique string'
New installations are advised to set this parameter to a random secret,
which would not be disclosed ouside the organization.
The same secret needs to be used by all swift servers of the same cluster.
Existing installations should set this parameter to an empty string
(the default)
DocImpact
Fixes: Bug #1157454
Change-Id: I63b10d0b7d6dd3f74e0f10bb41b5f240fa03578a
If the object replicator couldn't create a device's object directory
(due to permissions or whatever) it wouldn't do any work at all. This
fixes that.
Change-Id: I6a30439d036b29c9cfdb660428d13668e0dc8632
Fixes bug 1089140.
Turns out that if an exception bails out of the pickle loading (eg.
zero-byte hahes_file), the if clause to determine whether or not to
write out a fresh hashes_file can evaluate to false, leading to an
infinite loop.
This patch fixes this infinite loop generally, by ensuring that if any
exception is thrown, a new hashes_file is written.
Change-Id: I344c5f8e261ce7c667bdafe1687263a4150b21dc
If a partition directory was a file instead of a directory, the
object-replicator would attempt to listdir() it, raise an exception, and
try again next iteration. This condition could arise after running
xfs_repair.
Now, collect_jobs() will reap any partition directories which are
actually files. Fixes bug 1045954.
Change-Id: Id65d3eab2effd61c3f6b25250611c88c907b2a16
Basically, do all hashing in the replicator without a lock, then
lock briefly to rewrite the hashes file. Retry if someone else has
modified the hashes file in the mean time (which should be rare).
Also, a little refactoring.
Change-Id: I6257a53808d14b567bde70d2d18a9c58cb1e415a
This commit introduces a new algorithm for assigning partition
replicas to devices. Basically, the ring builder organizes the devices
into tiers (first zone, then IP/port, then device ID). When placing a
replica, the ring builder looks for the emptiest device (biggest
parts_wanted) in the furthest-away tier.
In the case where zone-count >= replica-count, the new algorithm will
give the same results as the one it replaces. Thus, no migration is
needed.
In the case where zone-count < replica-count, the new algorithm
behaves differently from the old algorithm. The new algorithm will
distribute things evenly at each tier so that the replication is as
high-quality as possible, given the circumstances. The old algorithm
would just crash, so again, no migration is needed.
Handoffs have also been updated to use the new algorithm. When
generating handoff nodes, first the ring looks for nodes in other
zones, then other ips/ports, then any other drive. The first handoff
nodes (the ones in other zones) will be the same as before; this
commit just extends the list of handoff nodes.
The proxy server and replicators have been altered to avoid looking at
the ring's replica count directly. Previously, with a replica count of
C, RingData.get_nodes() and RingData.get_part_nodes() would return
lists of length C, so some other code used the replica count when it
needed the number of nodes. If two of a partition's replicas are on
the same device (e.g. with 3 replicas, 2 devices), then that
assumption is no longer true. Fortunately, all the proxy server and
replicators really needed was the number of nodes returned, which they
already had. (Bonus: now the only code that mentions replica_count
directly is in the ring and the ring builder.)
Change-Id: Iba2929edfc6ece89791890d0635d4763d821a3aa