2016-07-04 18:21:54 +02:00
|
|
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
# you may not use this file except in compliance with the License.
|
|
|
|
# You may obtain a copy of the License at
|
|
|
|
#
|
|
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
#
|
|
|
|
# Unless required by applicable law or agreed to in writing, software
|
|
|
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
|
|
|
# implied.
|
|
|
|
# See the License for the specific language governing permissions and
|
|
|
|
# limitations under the License.
|
|
|
|
|
2019-11-14 16:26:48 -05:00
|
|
|
import errno
|
|
|
|
import fcntl
|
|
|
|
import json
|
2021-01-25 10:27:19 -08:00
|
|
|
from contextlib import contextmanager
|
2021-02-01 21:04:36 -08:00
|
|
|
import logging
|
|
|
|
from textwrap import dedent
|
|
|
|
|
2021-02-01 16:40:21 -08:00
|
|
|
import mock
|
2016-07-04 18:21:54 +02:00
|
|
|
import os
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
import pickle
|
2016-07-04 18:21:54 +02:00
|
|
|
import shutil
|
|
|
|
import tempfile
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
import time
|
2016-07-04 18:21:54 +02:00
|
|
|
import unittest
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
import uuid
|
2016-07-04 18:21:54 +02:00
|
|
|
|
2021-02-13 09:07:43 -06:00
|
|
|
from six.moves import cStringIO as StringIO
|
|
|
|
|
2016-07-04 18:21:54 +02:00
|
|
|
from swift.cli import relinker
|
2021-03-08 17:16:37 +00:00
|
|
|
from swift.common import ring, utils
|
2016-07-04 18:21:54 +02:00
|
|
|
from swift.common import storage_policy
|
2021-02-13 09:07:43 -06:00
|
|
|
from swift.common.exceptions import PathNotDir
|
2016-07-04 18:21:54 +02:00
|
|
|
from swift.common.storage_policy import (
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
StoragePolicy, StoragePolicyCollection, POLICIES, ECStoragePolicy,
|
|
|
|
get_policy_string)
|
2016-07-04 18:21:54 +02:00
|
|
|
|
2021-03-03 17:30:24 +00:00
|
|
|
from swift.obj.diskfile import write_metadata, DiskFileRouter, \
|
2021-05-07 14:03:18 -07:00
|
|
|
DiskFileManager, relink_paths, BaseDiskFileManager
|
2016-07-04 18:21:54 +02:00
|
|
|
|
2021-04-01 14:33:39 -07:00
|
|
|
from test.debug_logger import debug_logger
|
|
|
|
from test.unit import skip_if_no_xattrs, DEFAULT_TEST_EC_TYPE, \
|
2018-06-29 16:13:05 +02:00
|
|
|
patch_policies
|
2016-07-04 18:21:54 +02:00
|
|
|
|
|
|
|
|
2019-11-14 16:26:48 -05:00
|
|
|
PART_POWER = 8
|
|
|
|
|
|
|
|
|
2016-07-04 18:21:54 +02:00
|
|
|
class TestRelinker(unittest.TestCase):
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
|
|
|
|
maxDiff = None
|
|
|
|
|
2016-07-04 18:21:54 +02:00
|
|
|
def setUp(self):
|
Add checksum to object extended attributes
Currently, our integrity checking for objects is pretty weak when it
comes to object metadata. If the extended attributes on a .data or
.meta file get corrupted in such a way that we can still unpickle it,
we don't have anything that detects that.
This could be especially bad with encrypted etags; if the encrypted
etag (X-Object-Sysmeta-Crypto-Etag or whatever it is) gets some bits
flipped, then we'll cheerfully decrypt the cipherjunk into plainjunk,
then send it to the client. Net effect is that the client sees a GET
response with an ETag that doesn't match the MD5 of the object *and*
Swift has no way of detecting and quarantining this object.
Note that, with an unencrypted object, if the ETag metadatum gets
mangled, then the object will be quarantined by the object server or
auditor, whichever notices first.
As part of this commit, I also ripped out some mocking of
getxattr/setxattr in tests. It appears to be there to allow unit tests
to run on systems where /tmp doesn't support xattrs. However, since
the mock is keyed off of inode number and inode numbers get re-used,
there's lots of leakage between different test runs. On a real FS,
unlinking a file and then creating a new one of the same name will
also reset the xattrs; this isn't the case with the mock.
The mock was pretty old; Ubuntu 12.04 and up all support xattrs in
/tmp, and recent Red Hat / CentOS releases do too. The xattr mock was
added in 2011; maybe it was to support Ubuntu Lucid Lynx?
Bonus: now you can pause a test with the debugger, inspect its files
in /tmp, and actually see the xattrs along with the data.
Since this patch now uses a real filesystem for testing filesystem
operations, tests are skipped if the underlying filesystem does not
support setting xattrs (eg tmpfs or more than 4k of xattrs on ext4).
References to "/tmp" have been replaced with calls to
tempfile.gettempdir(). This will allow setting the TMPDIR envvar in
test setup and getting an XFS filesystem instead of ext4 or tmpfs.
THIS PATCH SIGNIFICANTLY CHANGES TESTING ENVIRONMENTS
With this patch, every test environment will require TMPDIR to be
using a filesystem that supports at least 4k of extended attributes.
Neither ext4 nor tempfs support this. XFS is recommended.
So why all the SkipTests? Why not simply raise an error? We still need
the tests to run on the base image for OpenStack's CI system. Since
we were previously mocking out xattr, there wasn't a problem, but we
also weren't actually testing anything. This patch adds functionality
to validate xattr data, so we need to drop the mock.
`test.unit.skip_if_no_xattrs()` is also imported into `test.functional`
so that functional tests can import it from the functional test
namespace.
The related OpenStack CI infrastructure changes are made in
https://review.openstack.org/#/c/394600/.
Co-Authored-By: John Dickinson <me@not.mn>
Change-Id: I98a37c0d451f4960b7a12f648e4405c6c6716808
2016-06-30 16:52:58 -07:00
|
|
|
skip_if_no_xattrs()
|
2021-04-01 14:33:39 -07:00
|
|
|
self.logger = debug_logger()
|
2016-07-04 18:21:54 +02:00
|
|
|
self.testdir = tempfile.mkdtemp()
|
|
|
|
self.devices = os.path.join(self.testdir, 'node')
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.recon_cache_path = os.path.join(self.testdir, 'cache')
|
|
|
|
self.recon_cache = os.path.join(self.recon_cache_path,
|
|
|
|
'relinker.recon')
|
2021-02-13 09:07:43 -06:00
|
|
|
shutil.rmtree(self.testdir, ignore_errors=True)
|
2016-07-04 18:21:54 +02:00
|
|
|
os.mkdir(self.testdir)
|
|
|
|
os.mkdir(self.devices)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
os.mkdir(self.recon_cache_path)
|
2016-07-04 18:21:54 +02:00
|
|
|
|
2019-11-14 16:26:48 -05:00
|
|
|
self.rb = ring.RingBuilder(PART_POWER, 6.0, 1)
|
2016-07-04 18:21:54 +02:00
|
|
|
|
|
|
|
for i in range(6):
|
|
|
|
ip = "127.0.0.%s" % i
|
|
|
|
self.rb.add_dev({'id': i, 'region': 0, 'zone': 0, 'weight': 1,
|
|
|
|
'ip': ip, 'port': 10000, 'device': 'sda1'})
|
|
|
|
self.rb.rebalance(seed=1)
|
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.conf_file = os.path.join(self.testdir, 'relinker.conf')
|
|
|
|
self._setup_config()
|
|
|
|
|
2016-07-04 18:21:54 +02:00
|
|
|
self.existing_device = 'sda1'
|
|
|
|
os.mkdir(os.path.join(self.devices, self.existing_device))
|
|
|
|
self.objects = os.path.join(self.devices, self.existing_device,
|
|
|
|
'objects')
|
2021-03-09 16:17:41 +00:00
|
|
|
self.policy = StoragePolicy(0, 'platinum', True)
|
|
|
|
storage_policy._POLICIES = StoragePolicyCollection([self.policy])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self._setup_object(policy=self.policy)
|
|
|
|
|
2021-07-01 10:16:51 -05:00
|
|
|
patcher = mock.patch('swift.cli.relinker.hubs')
|
|
|
|
self.mock_hubs = patcher.start()
|
|
|
|
self.addCleanup(patcher.stop)
|
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
def _setup_config(self):
|
|
|
|
config = """
|
|
|
|
[DEFAULT]
|
|
|
|
swift_dir = {swift_dir}
|
|
|
|
devices = {devices}
|
|
|
|
mount_check = {mount_check}
|
|
|
|
|
|
|
|
[object-relinker]
|
|
|
|
recon_cache_path = {recon_cache_path}
|
|
|
|
# update every chance we get!
|
|
|
|
stats_interval = 0
|
|
|
|
""".format(
|
|
|
|
swift_dir=self.testdir,
|
|
|
|
devices=self.devices,
|
|
|
|
mount_check=False,
|
|
|
|
recon_cache_path=self.recon_cache_path,
|
|
|
|
)
|
|
|
|
with open(self.conf_file, 'w') as f:
|
|
|
|
f.write(dedent(config))
|
2021-02-13 09:07:43 -06:00
|
|
|
|
2021-03-15 12:07:10 +00:00
|
|
|
def _get_object_name(self, condition=None):
|
2021-02-13 09:07:43 -06:00
|
|
|
attempts = []
|
|
|
|
for _ in range(50):
|
|
|
|
account = 'a'
|
|
|
|
container = 'c'
|
|
|
|
obj = 'o-' + str(uuid.uuid4())
|
2021-03-15 12:07:10 +00:00
|
|
|
_hash = utils.hash_path(account, container, obj)
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
part = utils.get_partition_for_hash(_hash, PART_POWER)
|
|
|
|
next_part = utils.get_partition_for_hash(_hash, PART_POWER + 1)
|
2021-03-15 12:07:10 +00:00
|
|
|
obj_path = os.path.join(os.path.sep, account, container, obj)
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
# There's 1/512 chance that both old and new parts will be 0;
|
|
|
|
# that's not a terribly interesting case, as there's nothing to do
|
2021-03-15 12:07:10 +00:00
|
|
|
attempts.append((part, next_part, 2**PART_POWER))
|
|
|
|
if (part != next_part and
|
|
|
|
(condition(part) if condition else True)):
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
break
|
2021-02-13 09:07:43 -06:00
|
|
|
else:
|
|
|
|
self.fail('Failed to setup object satisfying test preconditions %s'
|
|
|
|
% attempts)
|
2021-03-15 12:07:10 +00:00
|
|
|
return _hash, part, next_part, obj_path
|
|
|
|
|
2021-06-30 14:05:23 +01:00
|
|
|
def _create_object(self, policy, part, _hash, ext='.data'):
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
objects_dir = os.path.join(self.devices, self.existing_device,
|
|
|
|
get_policy_string('objects', policy))
|
|
|
|
shutil.rmtree(objects_dir, ignore_errors=True)
|
|
|
|
os.mkdir(objects_dir)
|
|
|
|
objdir = os.path.join(objects_dir, str(part), _hash[-3:], _hash)
|
|
|
|
os.makedirs(objdir)
|
|
|
|
timestamp = utils.Timestamp.now()
|
2021-06-30 14:05:23 +01:00
|
|
|
filename = timestamp.internal + ext
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
objname = os.path.join(objdir, filename)
|
|
|
|
with open(objname, "wb") as dummy:
|
|
|
|
dummy.write(b"Hello World!")
|
|
|
|
write_metadata(dummy,
|
|
|
|
{'name': self.obj_path, 'Content-Length': '12'})
|
|
|
|
return objdir, filename, timestamp
|
|
|
|
|
2021-06-30 14:05:23 +01:00
|
|
|
def _setup_object(self, condition=None, policy=None, ext='.data'):
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
policy = policy or self.policy
|
2021-03-15 12:07:10 +00:00
|
|
|
_hash, part, next_part, obj_path = self._get_object_name(condition)
|
|
|
|
self._hash = _hash
|
|
|
|
self.part = part
|
|
|
|
self.next_part = next_part
|
|
|
|
self.obj_path = obj_path
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
objects_dir = os.path.join(self.devices, self.existing_device,
|
|
|
|
get_policy_string('objects', policy))
|
2021-02-13 09:07:43 -06:00
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.objdir, self.object_fname, self.obj_ts = self._create_object(
|
2021-06-30 14:05:23 +01:00
|
|
|
policy, part, _hash, ext)
|
2021-02-13 09:07:43 -06:00
|
|
|
|
2016-07-04 18:21:54 +02:00
|
|
|
self.objname = os.path.join(self.objdir, self.object_fname)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.part_dir = os.path.join(objects_dir, str(self.part))
|
2021-03-08 17:16:37 +00:00
|
|
|
self.suffix = self._hash[-3:]
|
|
|
|
self.suffix_dir = os.path.join(self.part_dir, self.suffix)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.next_part_dir = os.path.join(objects_dir, str(self.next_part))
|
2021-03-08 17:16:37 +00:00
|
|
|
self.next_suffix_dir = os.path.join(self.next_part_dir, self.suffix)
|
2021-02-13 09:07:43 -06:00
|
|
|
self.expected_dir = os.path.join(self.next_suffix_dir, self._hash)
|
2016-07-04 18:21:54 +02:00
|
|
|
self.expected_file = os.path.join(self.expected_dir, self.object_fname)
|
|
|
|
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
def _make_link(self, filename, part_power):
|
|
|
|
# make a file in the older part_power location and link it to a file in
|
|
|
|
# the next part power location
|
|
|
|
new_filepath = os.path.join(self.expected_dir, filename)
|
|
|
|
older_filepath = utils.replace_partition_in_path(
|
|
|
|
self.devices, new_filepath, part_power)
|
|
|
|
os.makedirs(os.path.dirname(older_filepath))
|
|
|
|
with open(older_filepath, 'w') as fd:
|
|
|
|
fd.write(older_filepath)
|
|
|
|
os.makedirs(self.expected_dir)
|
|
|
|
os.link(older_filepath, new_filepath)
|
|
|
|
with open(new_filepath, 'r') as fd:
|
|
|
|
self.assertEqual(older_filepath, fd.read()) # sanity check
|
|
|
|
return older_filepath, new_filepath
|
|
|
|
|
2021-03-09 16:17:41 +00:00
|
|
|
def _save_ring(self, policies=POLICIES):
|
2021-02-13 09:07:43 -06:00
|
|
|
self.rb._ring = None
|
2016-07-04 18:21:54 +02:00
|
|
|
rd = self.rb.get_ring()
|
2021-03-09 16:17:41 +00:00
|
|
|
for policy in policies:
|
2016-07-04 18:21:54 +02:00
|
|
|
rd.save(os.path.join(
|
|
|
|
self.testdir, '%s.ring.gz' % policy.ring_name))
|
|
|
|
# Enforce ring reloading in relinker
|
|
|
|
policy.object_ring = None
|
|
|
|
|
|
|
|
def tearDown(self):
|
2021-02-13 09:07:43 -06:00
|
|
|
shutil.rmtree(self.testdir, ignore_errors=True)
|
2016-07-04 18:21:54 +02:00
|
|
|
storage_policy.reload_storage_policies()
|
|
|
|
|
2021-01-29 12:43:54 -08:00
|
|
|
@contextmanager
|
|
|
|
def _mock_listdir(self):
|
|
|
|
orig_listdir = utils.listdir
|
|
|
|
|
|
|
|
def mocked(path):
|
|
|
|
if path == self.objects:
|
|
|
|
raise OSError
|
|
|
|
return orig_listdir(path)
|
|
|
|
|
|
|
|
with mock.patch('swift.common.utils.listdir', mocked):
|
|
|
|
yield
|
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
@contextmanager
|
|
|
|
def _mock_relinker(self):
|
|
|
|
with mock.patch.object(relinker.logging, 'getLogger',
|
2021-05-07 14:03:18 -07:00
|
|
|
return_value=self.logger), \
|
|
|
|
mock.patch.object(relinker, 'get_logger',
|
|
|
|
return_value=self.logger), \
|
|
|
|
mock.patch('swift.cli.relinker.DEFAULT_RECON_CACHE_PATH',
|
|
|
|
self.recon_cache_path):
|
|
|
|
yield
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
|
2021-01-06 16:18:04 -08:00
|
|
|
def test_workers_parent(self):
|
|
|
|
os.mkdir(os.path.join(self.devices, 'sda2'))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self.rb.increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
pids = {
|
|
|
|
2: 0,
|
|
|
|
3: 0,
|
|
|
|
}
|
|
|
|
|
|
|
|
def mock_wait():
|
|
|
|
return pids.popitem()
|
|
|
|
|
|
|
|
with mock.patch('os.fork', side_effect=list(pids.keys())), \
|
|
|
|
mock.patch('os.wait', mock_wait):
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--workers', '2',
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
self.assertEqual(pids, {})
|
|
|
|
|
|
|
|
def test_workers_parent_bubbles_up_errors(self):
|
|
|
|
def do_test(wait_result, msg):
|
|
|
|
pids = {
|
|
|
|
2: 0,
|
|
|
|
3: 0,
|
|
|
|
4: 0,
|
|
|
|
5: wait_result,
|
|
|
|
6: 0,
|
|
|
|
}
|
|
|
|
|
|
|
|
with mock.patch('os.fork', side_effect=list(pids.keys())), \
|
|
|
|
mock.patch('os.wait', lambda: pids.popitem()), \
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self._mock_relinker():
|
2021-01-06 16:18:04 -08:00
|
|
|
self.assertEqual(1, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
self.assertEqual(pids, {})
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
|
|
|
self.assertTrue(
|
|
|
|
warning_lines[0].startswith('Worker (pid=5, devs='))
|
|
|
|
self.assertTrue(
|
|
|
|
warning_lines[0].endswith(msg),
|
|
|
|
'Expected log line to end with %r; got %r'
|
|
|
|
% (msg, warning_lines[0]))
|
|
|
|
self.assertFalse(warning_lines[1:])
|
2021-04-12 17:08:43 +01:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertEqual(2, len(info_lines))
|
|
|
|
self.assertIn('Starting relinker (cleanup=True) using 5 workers:',
|
|
|
|
info_lines[0])
|
|
|
|
self.assertIn('Finished relinker (cleanup=True):',
|
|
|
|
info_lines[1])
|
|
|
|
print(info_lines)
|
2021-01-06 16:18:04 -08:00
|
|
|
self.logger.clear()
|
|
|
|
|
|
|
|
os.mkdir(os.path.join(self.devices, 'sda2'))
|
|
|
|
os.mkdir(os.path.join(self.devices, 'sda3'))
|
|
|
|
os.mkdir(os.path.join(self.devices, 'sda4'))
|
|
|
|
os.mkdir(os.path.join(self.devices, 'sda5'))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self.rb.increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
# signals get the low bits
|
|
|
|
do_test(9, 'exited in 0.0s after receiving signal: 9')
|
|
|
|
# exit codes get the high
|
|
|
|
do_test(1 << 8, 'completed in 0.0s with errors')
|
|
|
|
do_test(42 << 8, 'exited in 0.0s with unexpected status 42')
|
|
|
|
|
|
|
|
def test_workers_children(self):
|
|
|
|
os.mkdir(os.path.join(self.devices, 'sda2'))
|
|
|
|
os.mkdir(os.path.join(self.devices, 'sda3'))
|
|
|
|
os.mkdir(os.path.join(self.devices, 'sda4'))
|
|
|
|
os.mkdir(os.path.join(self.devices, 'sda5'))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self.rb.increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
|
|
|
|
calls = []
|
|
|
|
|
|
|
|
def fake_fork():
|
|
|
|
calls.append('fork')
|
|
|
|
return 0
|
|
|
|
|
|
|
|
def fake_run(self):
|
|
|
|
calls.append(('run', self.device_list))
|
|
|
|
return 0
|
|
|
|
|
|
|
|
def fake_exit(status):
|
|
|
|
calls.append(('exit', status))
|
|
|
|
|
|
|
|
with mock.patch('os.fork', fake_fork), \
|
|
|
|
mock.patch('os._exit', fake_exit), \
|
|
|
|
mock.patch('swift.cli.relinker.Relinker.run', fake_run):
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--workers', '2',
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
self.assertEqual([
|
|
|
|
'fork',
|
|
|
|
('run', ['sda1', 'sda3', 'sda5']),
|
|
|
|
('exit', 0),
|
|
|
|
'fork',
|
|
|
|
('run', ['sda2', 'sda4']),
|
|
|
|
('exit', 0),
|
|
|
|
], calls)
|
|
|
|
|
|
|
|
# test too many workers
|
|
|
|
calls = []
|
|
|
|
|
|
|
|
with mock.patch('os.fork', fake_fork), \
|
|
|
|
mock.patch('os._exit', fake_exit), \
|
|
|
|
mock.patch('swift.cli.relinker.Relinker.run', fake_run):
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--workers', '6',
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
self.assertEqual([
|
|
|
|
'fork',
|
|
|
|
('run', ['sda1']),
|
|
|
|
('exit', 0),
|
|
|
|
'fork',
|
|
|
|
('run', ['sda2']),
|
|
|
|
('exit', 0),
|
|
|
|
'fork',
|
|
|
|
('run', ['sda3']),
|
|
|
|
('exit', 0),
|
|
|
|
'fork',
|
|
|
|
('run', ['sda4']),
|
|
|
|
('exit', 0),
|
|
|
|
'fork',
|
|
|
|
('run', ['sda5']),
|
|
|
|
('exit', 0),
|
|
|
|
], calls)
|
|
|
|
|
2021-01-25 10:27:19 -08:00
|
|
|
def _do_test_relinker_drop_privileges(self, command):
|
|
|
|
@contextmanager
|
|
|
|
def do_mocks():
|
|
|
|
# attach mocks to call_capture so that call order can be asserted
|
|
|
|
call_capture = mock.Mock()
|
2021-01-06 16:18:04 -08:00
|
|
|
mod = 'swift.cli.relinker.'
|
|
|
|
with mock.patch(mod + 'Relinker') as mock_relinker, \
|
|
|
|
mock.patch(mod + 'drop_privileges') as mock_dp, \
|
|
|
|
mock.patch(mod + 'os.listdir',
|
|
|
|
return_value=['sda', 'sdb']):
|
|
|
|
mock_relinker.return_value.run.return_value = 0
|
|
|
|
call_capture.attach_mock(mock_dp, 'drop_privileges')
|
|
|
|
call_capture.attach_mock(mock_relinker, 'run')
|
|
|
|
yield call_capture
|
2021-01-25 10:27:19 -08:00
|
|
|
|
|
|
|
# no user option
|
|
|
|
with do_mocks() as capture:
|
2021-01-06 16:18:04 -08:00
|
|
|
self.assertEqual(0, relinker.main([command, '--workers', '0']))
|
|
|
|
self.assertEqual([mock.call.run(mock.ANY, mock.ANY, ['sda', 'sdb'],
|
|
|
|
do_cleanup=(command == 'cleanup'))],
|
2021-01-25 10:27:19 -08:00
|
|
|
capture.method_calls)
|
|
|
|
|
|
|
|
# cli option --user
|
|
|
|
with do_mocks() as capture:
|
2021-01-06 16:18:04 -08:00
|
|
|
self.assertEqual(0, relinker.main([command, '--user', 'cli_user',
|
|
|
|
'--workers', '0']))
|
2021-01-25 10:27:19 -08:00
|
|
|
self.assertEqual([('drop_privileges', ('cli_user',), {}),
|
2021-01-06 16:18:04 -08:00
|
|
|
mock.call.run(mock.ANY, mock.ANY, ['sda', 'sdb'],
|
|
|
|
do_cleanup=(command == 'cleanup'))],
|
2021-01-25 10:27:19 -08:00
|
|
|
capture.method_calls)
|
|
|
|
|
|
|
|
# cli option --user takes precedence over conf file user
|
|
|
|
with do_mocks() as capture:
|
|
|
|
with mock.patch('swift.cli.relinker.readconf',
|
|
|
|
return_value={'user': 'conf_user'}):
|
|
|
|
self.assertEqual(0, relinker.main([command, 'conf_file',
|
2021-01-06 16:18:04 -08:00
|
|
|
'--user', 'cli_user',
|
|
|
|
'--workers', '0']))
|
2021-01-25 10:27:19 -08:00
|
|
|
self.assertEqual([('drop_privileges', ('cli_user',), {}),
|
2021-01-06 16:18:04 -08:00
|
|
|
mock.call.run(mock.ANY, mock.ANY, ['sda', 'sdb'],
|
|
|
|
do_cleanup=(command == 'cleanup'))],
|
2021-01-25 10:27:19 -08:00
|
|
|
capture.method_calls)
|
|
|
|
|
|
|
|
# conf file user
|
|
|
|
with do_mocks() as capture:
|
|
|
|
with mock.patch('swift.cli.relinker.readconf',
|
2021-01-06 16:18:04 -08:00
|
|
|
return_value={'user': 'conf_user',
|
|
|
|
'workers': '0'}):
|
2021-01-25 10:27:19 -08:00
|
|
|
self.assertEqual(0, relinker.main([command, 'conf_file']))
|
|
|
|
self.assertEqual([('drop_privileges', ('conf_user',), {}),
|
2021-01-06 16:18:04 -08:00
|
|
|
mock.call.run(mock.ANY, mock.ANY, ['sda', 'sdb'],
|
|
|
|
do_cleanup=(command == 'cleanup'))],
|
2021-01-25 10:27:19 -08:00
|
|
|
capture.method_calls)
|
|
|
|
|
|
|
|
def test_relinker_drop_privileges(self):
|
|
|
|
self._do_test_relinker_drop_privileges('relink')
|
|
|
|
self._do_test_relinker_drop_privileges('cleanup')
|
|
|
|
|
2021-02-12 11:19:59 +00:00
|
|
|
def _do_test_relinker_files_per_second(self, command):
|
|
|
|
# no files per second
|
|
|
|
with mock.patch('swift.cli.relinker.RateLimitedIterator') as it:
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
command,
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
it.assert_not_called()
|
|
|
|
|
|
|
|
# zero files per second
|
|
|
|
with mock.patch('swift.cli.relinker.RateLimitedIterator') as it:
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
command,
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
'--files-per-second', '0'
|
|
|
|
]))
|
|
|
|
it.assert_not_called()
|
|
|
|
|
|
|
|
# positive files per second
|
|
|
|
locations = iter([])
|
|
|
|
with mock.patch('swift.cli.relinker.audit_location_generator',
|
|
|
|
return_value=locations):
|
|
|
|
with mock.patch('swift.cli.relinker.RateLimitedIterator') as it:
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
command,
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
'--files-per-second', '1.23'
|
|
|
|
]))
|
|
|
|
it.assert_called_once_with(locations, 1.23)
|
|
|
|
|
|
|
|
# negative files per second
|
2021-02-13 09:07:43 -06:00
|
|
|
err = StringIO()
|
|
|
|
with mock.patch('sys.stderr', err):
|
|
|
|
with self.assertRaises(SystemExit) as cm:
|
|
|
|
relinker.main([
|
|
|
|
command,
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
'--files-per-second', '-1'
|
|
|
|
])
|
2021-02-12 11:19:59 +00:00
|
|
|
self.assertEqual(2, cm.exception.code) # NB exit code 2 from argparse
|
2021-02-13 09:07:43 -06:00
|
|
|
self.assertIn('--files-per-second: invalid non_negative_float value',
|
|
|
|
err.getvalue())
|
2021-02-12 11:19:59 +00:00
|
|
|
|
|
|
|
def test_relink_files_per_second(self):
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
self._do_test_relinker_files_per_second('relink')
|
|
|
|
|
|
|
|
def test_cleanup_files_per_second(self):
|
|
|
|
self._common_test_cleanup()
|
|
|
|
self._do_test_relinker_files_per_second('cleanup')
|
|
|
|
|
2021-03-10 15:19:15 +00:00
|
|
|
@patch_policies(
|
|
|
|
[StoragePolicy(0, name='gold', is_default=True),
|
|
|
|
ECStoragePolicy(1, name='platinum', ec_type=DEFAULT_TEST_EC_TYPE,
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
ec_ndata=4, ec_nparity=2)],
|
|
|
|
fake_ring_args=[{}, {}])
|
2021-02-01 21:04:36 -08:00
|
|
|
def test_conf_file(self):
|
|
|
|
config = """
|
|
|
|
[DEFAULT]
|
2021-02-18 19:43:56 +00:00
|
|
|
swift_dir = %s
|
2021-02-01 21:04:36 -08:00
|
|
|
devices = /test/node
|
|
|
|
mount_check = false
|
2021-02-12 21:25:43 -08:00
|
|
|
reclaim_age = 5184000
|
2021-02-01 21:04:36 -08:00
|
|
|
|
|
|
|
[object-relinker]
|
|
|
|
log_level = WARNING
|
|
|
|
log_name = test-relinker
|
2021-02-18 19:43:56 +00:00
|
|
|
""" % self.testdir
|
2021-02-01 21:04:36 -08:00
|
|
|
conf_file = os.path.join(self.testdir, 'relinker.conf')
|
|
|
|
with open(conf_file, 'w') as f:
|
|
|
|
f.write(dedent(config))
|
|
|
|
|
|
|
|
# cite conf file on command line
|
2021-01-06 16:18:04 -08:00
|
|
|
with mock.patch('swift.cli.relinker.Relinker') as mock_relinker:
|
2021-02-01 21:04:36 -08:00
|
|
|
relinker.main(['relink', conf_file, '--device', 'sdx', '--debug'])
|
2021-02-18 19:43:56 +00:00
|
|
|
exp_conf = {
|
2021-02-12 21:25:43 -08:00
|
|
|
'__file__': mock.ANY,
|
2021-02-18 19:43:56 +00:00
|
|
|
'swift_dir': self.testdir,
|
2021-02-12 21:25:43 -08:00
|
|
|
'devices': '/test/node',
|
|
|
|
'mount_check': False,
|
|
|
|
'reclaim_age': '5184000',
|
|
|
|
'files_per_second': 0.0,
|
|
|
|
'log_name': 'test-relinker',
|
|
|
|
'log_level': 'DEBUG',
|
2021-03-15 12:07:10 +00:00
|
|
|
'policies': POLICIES,
|
2021-01-06 16:18:04 -08:00
|
|
|
'workers': 'auto',
|
2021-03-15 12:07:10 +00:00
|
|
|
'partitions': set(),
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
'recon_cache_path': '/var/cache/swift',
|
|
|
|
'stats_interval': 300.0,
|
2021-02-18 19:43:56 +00:00
|
|
|
}
|
2021-01-06 16:18:04 -08:00
|
|
|
mock_relinker.assert_called_once_with(
|
|
|
|
exp_conf, mock.ANY, ['sdx'], do_cleanup=False)
|
|
|
|
logger = mock_relinker.call_args[0][1]
|
2021-02-01 21:04:36 -08:00
|
|
|
# --debug overrides conf file
|
|
|
|
self.assertEqual(logging.DEBUG, logger.getEffectiveLevel())
|
|
|
|
self.assertEqual('test-relinker', logger.logger.name)
|
|
|
|
|
2021-02-18 19:43:56 +00:00
|
|
|
# check the conf is passed to DiskFileRouter
|
|
|
|
self._save_ring()
|
|
|
|
with mock.patch('swift.cli.relinker.diskfile.DiskFileRouter',
|
|
|
|
side_effect=DiskFileRouter) as mock_dfr:
|
|
|
|
relinker.main(['relink', conf_file, '--device', 'sdx', '--debug'])
|
|
|
|
mock_dfr.assert_called_once_with(exp_conf, mock.ANY)
|
|
|
|
|
2021-02-01 21:04:36 -08:00
|
|
|
# flip mount_check, no --debug...
|
|
|
|
config = """
|
|
|
|
[DEFAULT]
|
|
|
|
swift_dir = test/swift/dir
|
|
|
|
devices = /test/node
|
|
|
|
mount_check = true
|
|
|
|
|
|
|
|
[object-relinker]
|
|
|
|
log_level = WARNING
|
|
|
|
log_name = test-relinker
|
2021-01-28 16:13:29 -08:00
|
|
|
files_per_second = 11.1
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
recon_cache_path = /var/cache/swift-foo
|
|
|
|
stats_interval = 111
|
2021-02-01 21:04:36 -08:00
|
|
|
"""
|
|
|
|
with open(conf_file, 'w') as f:
|
|
|
|
f.write(dedent(config))
|
2021-01-06 16:18:04 -08:00
|
|
|
with mock.patch('swift.cli.relinker.Relinker') as mock_relinker:
|
2021-02-01 21:04:36 -08:00
|
|
|
relinker.main(['relink', conf_file, '--device', 'sdx'])
|
2021-01-06 16:18:04 -08:00
|
|
|
mock_relinker.assert_called_once_with({
|
2021-02-12 21:25:43 -08:00
|
|
|
'__file__': mock.ANY,
|
|
|
|
'swift_dir': 'test/swift/dir',
|
|
|
|
'devices': '/test/node',
|
|
|
|
'mount_check': True,
|
|
|
|
'files_per_second': 11.1,
|
|
|
|
'log_name': 'test-relinker',
|
|
|
|
'log_level': 'WARNING',
|
2021-03-15 12:07:10 +00:00
|
|
|
'policies': POLICIES,
|
|
|
|
'partitions': set(),
|
2021-01-06 16:18:04 -08:00
|
|
|
'workers': 'auto',
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
'recon_cache_path': '/var/cache/swift-foo',
|
|
|
|
'stats_interval': 111.0,
|
2021-01-06 16:18:04 -08:00
|
|
|
}, mock.ANY, ['sdx'], do_cleanup=False)
|
|
|
|
logger = mock_relinker.call_args[0][1]
|
2021-02-01 21:04:36 -08:00
|
|
|
self.assertEqual(logging.WARNING, logger.getEffectiveLevel())
|
|
|
|
self.assertEqual('test-relinker', logger.logger.name)
|
|
|
|
|
|
|
|
# override with cli options...
|
2021-06-30 14:05:23 +01:00
|
|
|
logger = debug_logger()
|
2021-01-06 16:18:04 -08:00
|
|
|
with mock.patch('swift.cli.relinker.Relinker') as mock_relinker:
|
2021-06-30 14:05:23 +01:00
|
|
|
with mock.patch('swift.cli.relinker.get_logger',
|
|
|
|
return_value=logger):
|
|
|
|
relinker.main([
|
|
|
|
'relink', conf_file, '--device', 'sdx', '--debug',
|
|
|
|
'--swift-dir', 'cli-dir', '--devices', 'cli-devs',
|
|
|
|
'--skip-mount-check', '--files-per-second', '2.2',
|
|
|
|
'--policy', '1', '--partition', '123',
|
|
|
|
'--partition', '123', '--partition', '456',
|
|
|
|
'--workers', '2',
|
|
|
|
'--stats-interval', '222',
|
|
|
|
])
|
2021-01-06 16:18:04 -08:00
|
|
|
mock_relinker.assert_called_once_with({
|
2021-02-12 21:25:43 -08:00
|
|
|
'__file__': mock.ANY,
|
|
|
|
'swift_dir': 'cli-dir',
|
|
|
|
'devices': 'cli-devs',
|
|
|
|
'mount_check': False,
|
|
|
|
'files_per_second': 2.2,
|
|
|
|
'log_level': 'DEBUG',
|
|
|
|
'log_name': 'test-relinker',
|
2021-03-19 10:33:12 -07:00
|
|
|
'policies': {POLICIES[1]},
|
2021-03-15 12:07:10 +00:00
|
|
|
'partitions': {123, 456},
|
2021-05-07 12:11:25 +10:00
|
|
|
'workers': 2,
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
'recon_cache_path': '/var/cache/swift-foo',
|
|
|
|
'stats_interval': 222.0,
|
2021-01-06 16:18:04 -08:00
|
|
|
}, mock.ANY, ['sdx'], do_cleanup=False)
|
2021-02-01 21:04:36 -08:00
|
|
|
|
2021-01-06 16:18:04 -08:00
|
|
|
with mock.patch('swift.cli.relinker.Relinker') as mock_relinker, \
|
2021-02-01 21:04:36 -08:00
|
|
|
mock.patch('logging.basicConfig') as mock_logging_config:
|
|
|
|
relinker.main(['relink', '--device', 'sdx',
|
|
|
|
'--swift-dir', 'cli-dir', '--devices', 'cli-devs',
|
|
|
|
'--skip-mount-check'])
|
2021-01-06 16:18:04 -08:00
|
|
|
mock_relinker.assert_called_once_with({
|
2021-02-12 21:25:43 -08:00
|
|
|
'swift_dir': 'cli-dir',
|
|
|
|
'devices': 'cli-devs',
|
|
|
|
'mount_check': False,
|
|
|
|
'files_per_second': 0.0,
|
|
|
|
'log_level': 'INFO',
|
2021-03-15 12:07:10 +00:00
|
|
|
'policies': POLICIES,
|
|
|
|
'partitions': set(),
|
2021-01-06 16:18:04 -08:00
|
|
|
'workers': 'auto',
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
'recon_cache_path': '/var/cache/swift',
|
|
|
|
'stats_interval': 300.0,
|
2021-01-06 16:18:04 -08:00
|
|
|
}, mock.ANY, ['sdx'], do_cleanup=False)
|
2021-02-01 21:04:36 -08:00
|
|
|
mock_logging_config.assert_called_once_with(
|
|
|
|
format='%(message)s', level=logging.INFO, filename=None)
|
|
|
|
|
2021-01-06 16:18:04 -08:00
|
|
|
with mock.patch('swift.cli.relinker.Relinker') as mock_relinker, \
|
2021-02-01 21:04:36 -08:00
|
|
|
mock.patch('logging.basicConfig') as mock_logging_config:
|
2021-03-19 10:33:12 -07:00
|
|
|
relinker.main([
|
|
|
|
'relink', '--debug',
|
|
|
|
'--swift-dir', 'cli-dir',
|
|
|
|
'--devices', 'cli-devs',
|
|
|
|
'--device', 'sdx',
|
|
|
|
'--skip-mount-check',
|
|
|
|
'--policy', '0',
|
|
|
|
'--policy', '1',
|
|
|
|
'--policy', '0',
|
|
|
|
])
|
2021-01-06 16:18:04 -08:00
|
|
|
mock_relinker.assert_called_once_with({
|
2021-02-12 21:25:43 -08:00
|
|
|
'swift_dir': 'cli-dir',
|
|
|
|
'devices': 'cli-devs',
|
|
|
|
'mount_check': False,
|
|
|
|
'files_per_second': 0.0,
|
|
|
|
'log_level': 'DEBUG',
|
2021-03-19 10:33:12 -07:00
|
|
|
'policies': set(POLICIES),
|
2021-03-15 12:07:10 +00:00
|
|
|
'partitions': set(),
|
2021-01-06 16:18:04 -08:00
|
|
|
'workers': 'auto',
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
'recon_cache_path': '/var/cache/swift',
|
|
|
|
'stats_interval': 300.0,
|
2021-01-06 16:18:04 -08:00
|
|
|
}, mock.ANY, ['sdx'], do_cleanup=False)
|
2021-02-01 21:04:36 -08:00
|
|
|
# --debug is now effective
|
|
|
|
mock_logging_config.assert_called_once_with(
|
|
|
|
format='%(message)s', level=logging.DEBUG, filename=None)
|
|
|
|
|
2021-05-07 12:11:25 +10:00
|
|
|
# now test overriding workers back to auto
|
|
|
|
config = """
|
|
|
|
[DEFAULT]
|
|
|
|
swift_dir = test/swift/dir
|
|
|
|
devices = /test/node
|
|
|
|
mount_check = true
|
|
|
|
|
|
|
|
[object-relinker]
|
|
|
|
log_level = WARNING
|
|
|
|
log_name = test-relinker
|
|
|
|
files_per_second = 11.1
|
|
|
|
workers = 8
|
|
|
|
"""
|
|
|
|
with open(conf_file, 'w') as f:
|
|
|
|
f.write(dedent(config))
|
|
|
|
devices = ['sdx%d' % i for i in range(8, 1)]
|
|
|
|
cli_cmd = ['relink', conf_file, '--device', 'sdx', '--workers', 'auto']
|
|
|
|
for device in devices:
|
|
|
|
cli_cmd.extend(['--device', device])
|
|
|
|
with mock.patch('swift.cli.relinker.Relinker') as mock_relinker:
|
|
|
|
relinker.main(cli_cmd)
|
|
|
|
mock_relinker.assert_called_once_with({
|
|
|
|
'__file__': mock.ANY,
|
|
|
|
'swift_dir': 'test/swift/dir',
|
|
|
|
'devices': '/test/node',
|
|
|
|
'mount_check': True,
|
|
|
|
'files_per_second': 11.1,
|
|
|
|
'log_name': 'test-relinker',
|
|
|
|
'log_level': 'WARNING',
|
|
|
|
'policies': POLICIES,
|
|
|
|
'partitions': set(),
|
|
|
|
'workers': 'auto',
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
'recon_cache_path': '/var/cache/swift',
|
|
|
|
'stats_interval': 300.0,
|
2021-05-07 12:11:25 +10:00
|
|
|
}, mock.ANY, ['sdx'], do_cleanup=False)
|
|
|
|
logger = mock_relinker.call_args[0][1]
|
|
|
|
self.assertEqual(logging.WARNING, logger.getEffectiveLevel())
|
|
|
|
self.assertEqual('test-relinker', logger.logger.name)
|
|
|
|
|
|
|
|
# and now globally
|
|
|
|
config = """
|
|
|
|
[DEFAULT]
|
|
|
|
swift_dir = test/swift/dir
|
|
|
|
devices = /test/node
|
|
|
|
mount_check = true
|
|
|
|
workers = 8
|
|
|
|
|
|
|
|
[object-relinker]
|
|
|
|
log_level = WARNING
|
|
|
|
log_name = test-relinker
|
|
|
|
files_per_second = 11.1
|
|
|
|
"""
|
|
|
|
with open(conf_file, 'w') as f:
|
|
|
|
f.write(dedent(config))
|
|
|
|
with mock.patch('swift.cli.relinker.Relinker') as mock_relinker:
|
|
|
|
relinker.main(cli_cmd)
|
|
|
|
mock_relinker.assert_called_once_with({
|
|
|
|
'__file__': mock.ANY,
|
|
|
|
'swift_dir': 'test/swift/dir',
|
|
|
|
'devices': '/test/node',
|
|
|
|
'mount_check': True,
|
|
|
|
'files_per_second': 11.1,
|
|
|
|
'log_name': 'test-relinker',
|
|
|
|
'log_level': 'WARNING',
|
|
|
|
'policies': POLICIES,
|
|
|
|
'partitions': set(),
|
|
|
|
'workers': 'auto',
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
'recon_cache_path': '/var/cache/swift',
|
|
|
|
'stats_interval': 300.0,
|
2021-05-07 12:11:25 +10:00
|
|
|
}, mock.ANY, ['sdx'], do_cleanup=False)
|
|
|
|
logger = mock_relinker.call_args[0][1]
|
|
|
|
self.assertEqual(logging.WARNING, logger.getEffectiveLevel())
|
|
|
|
self.assertEqual('test-relinker', logger.logger.name)
|
|
|
|
|
2021-07-01 10:16:51 -05:00
|
|
|
def test_relinker_utils_get_hub(self):
|
|
|
|
cli_cmd = ['relink', '--device', 'sdx', '--workers', 'auto',
|
|
|
|
'--device', '/some/device']
|
|
|
|
with mock.patch('swift.cli.relinker.Relinker'):
|
|
|
|
relinker.main(cli_cmd)
|
|
|
|
|
|
|
|
self.mock_hubs.use_hub.assert_called_with(utils.get_hub())
|
|
|
|
|
2021-02-13 09:07:43 -06:00
|
|
|
def test_relink_first_quartile_no_rehash(self):
|
|
|
|
# we need object name in lower half of current part
|
|
|
|
self._setup_object(lambda part: part < 2 ** (PART_POWER - 1))
|
|
|
|
self.assertLess(self.next_part, 2 ** PART_POWER)
|
2016-07-04 18:21:54 +02:00
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
2021-02-13 09:07:43 -06:00
|
|
|
|
|
|
|
with mock.patch('swift.obj.diskfile.DiskFileManager._hash_suffix',
|
|
|
|
return_value='foo') as mock_hash_suffix:
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
# ... and no rehash
|
|
|
|
self.assertEqual([], mock_hash_suffix.call_args_list)
|
2016-07-04 18:21:54 +02:00
|
|
|
|
|
|
|
self.assertTrue(os.path.isdir(self.expected_dir))
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file))
|
|
|
|
|
|
|
|
stat_old = os.stat(os.path.join(self.objdir, self.object_fname))
|
|
|
|
stat_new = os.stat(self.expected_file)
|
|
|
|
self.assertEqual(stat_old.st_ino, stat_new.st_ino)
|
2021-02-13 09:07:43 -06:00
|
|
|
# Invalidated now, rehashed during cleanup
|
|
|
|
with open(os.path.join(self.next_part_dir, 'hashes.invalid')) as fp:
|
|
|
|
self.assertEqual(fp.read(), self._hash[-3:] + '\n')
|
|
|
|
self.assertFalse(os.path.exists(
|
|
|
|
os.path.join(self.next_part_dir, 'hashes.pkl')))
|
2016-07-04 18:21:54 +02:00
|
|
|
|
2021-02-13 09:07:43 -06:00
|
|
|
def test_relink_second_quartile_does_rehash(self):
|
|
|
|
# we need a part in upper half of current part power
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
self.assertGreaterEqual(self.next_part, 2 ** PART_POWER)
|
|
|
|
self.assertTrue(self.rb.prepare_increase_partition_power())
|
|
|
|
self._save_ring()
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
|
2021-02-13 09:07:43 -06:00
|
|
|
with mock.patch('swift.obj.diskfile.DiskFileManager._hash_suffix',
|
|
|
|
return_value='foo') as mock_hash_suffix:
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
# we rehash the new suffix dirs as we go
|
|
|
|
self.assertEqual([mock.call(self.next_suffix_dir, policy=self.policy)],
|
|
|
|
mock_hash_suffix.call_args_list)
|
|
|
|
|
|
|
|
# Invalidated and rehashed during relinking
|
|
|
|
with open(os.path.join(self.next_part_dir, 'hashes.invalid')) as fp:
|
|
|
|
self.assertEqual(fp.read(), '')
|
|
|
|
with open(os.path.join(self.next_part_dir, 'hashes.pkl'), 'rb') as fp:
|
|
|
|
hashes = pickle.load(fp)
|
|
|
|
self.assertIn(self._hash[-3:], hashes)
|
|
|
|
self.assertEqual('foo', hashes[self._hash[-3:]])
|
|
|
|
self.assertFalse(os.path.exists(
|
|
|
|
os.path.join(self.part_dir, 'hashes.invalid')))
|
2021-03-19 13:32:49 -07:00
|
|
|
# Check that only the dirty partition in upper half of next part power
|
|
|
|
# has been created and rehashed
|
|
|
|
other_next_part = self.next_part ^ 1
|
|
|
|
other_next_part_dir = os.path.join(self.objects, str(other_next_part))
|
|
|
|
self.assertFalse(os.path.exists(other_next_part_dir))
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
|
2021-03-19 20:14:20 +00:00
|
|
|
def _do_link_test(self, command, old_file_specs, new_file_specs,
|
|
|
|
conflict_file_specs, exp_old_specs, exp_new_specs,
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
exp_ret_code=0, relink_errors=None,
|
|
|
|
mock_relink_paths=None, extra_options=None):
|
2021-03-09 17:44:26 +00:00
|
|
|
# Each 'spec' is a tuple (file extension, timestamp offset); files are
|
|
|
|
# created for each old_file_specs and links are created for each in
|
2021-03-19 20:14:20 +00:00
|
|
|
# new_file_specs, then cleanup is run and checks made that
|
|
|
|
# exp_old_specs and exp_new_specs exist.
|
|
|
|
# - conflict_file_specs are files in the new partition that are *not*
|
|
|
|
# linked to the same file in the old partition
|
|
|
|
# - relink_errors is a dict ext->exception; the exception will be
|
|
|
|
# raised each time relink_paths is called with a target_path ending
|
|
|
|
# with 'ext'
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
self.assertFalse(relink_errors and mock_relink_paths) # sanity check
|
2021-03-19 20:14:20 +00:00
|
|
|
new_file_specs = [] if new_file_specs is None else new_file_specs
|
|
|
|
conflict_file_specs = ([] if conflict_file_specs is None
|
|
|
|
else conflict_file_specs)
|
|
|
|
exp_old_specs = [] if exp_old_specs is None else exp_old_specs
|
|
|
|
relink_errors = {} if relink_errors is None else relink_errors
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
extra_options = extra_options if extra_options else []
|
2021-03-09 17:44:26 +00:00
|
|
|
# remove the file created by setUp - we'll create it again if wanted
|
|
|
|
os.unlink(self.objname)
|
|
|
|
|
|
|
|
def make_filenames(specs):
|
|
|
|
filenames = []
|
|
|
|
for ext, ts_delta in specs:
|
|
|
|
ts = utils.Timestamp(float(self.obj_ts) + ts_delta)
|
|
|
|
filename = '.'.join([ts.internal, ext])
|
|
|
|
filenames.append(filename)
|
|
|
|
return filenames
|
|
|
|
|
|
|
|
old_filenames = make_filenames(old_file_specs)
|
|
|
|
new_filenames = make_filenames(new_file_specs)
|
2021-03-19 20:14:20 +00:00
|
|
|
conflict_filenames = make_filenames(conflict_file_specs)
|
|
|
|
if new_filenames or conflict_filenames:
|
2021-03-09 17:44:26 +00:00
|
|
|
os.makedirs(self.expected_dir)
|
|
|
|
for filename in old_filenames:
|
|
|
|
filepath = os.path.join(self.objdir, filename)
|
|
|
|
with open(filepath, 'w') as fd:
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
fd.write(filepath)
|
2021-03-09 17:44:26 +00:00
|
|
|
for filename in new_filenames:
|
|
|
|
new_filepath = os.path.join(self.expected_dir, filename)
|
|
|
|
if filename in old_filenames:
|
|
|
|
filepath = os.path.join(self.objdir, filename)
|
|
|
|
os.link(filepath, new_filepath)
|
|
|
|
else:
|
|
|
|
with open(new_filepath, 'w') as fd:
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
fd.write(new_filepath)
|
2021-03-19 20:14:20 +00:00
|
|
|
for filename in conflict_filenames:
|
|
|
|
new_filepath = os.path.join(self.expected_dir, filename)
|
|
|
|
with open(new_filepath, 'w') as fd:
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
fd.write(new_filepath)
|
2021-03-09 17:44:26 +00:00
|
|
|
|
2021-03-19 20:14:20 +00:00
|
|
|
orig_relink_paths = relink_paths
|
2021-03-09 17:44:26 +00:00
|
|
|
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
def default_mock_relink_paths(target_path, new_target_path, **kwargs):
|
2021-03-19 20:14:20 +00:00
|
|
|
for ext, error in relink_errors.items():
|
|
|
|
if target_path.endswith(ext):
|
|
|
|
raise error
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
return orig_relink_paths(target_path, new_target_path,
|
|
|
|
**kwargs)
|
2021-03-19 20:14:20 +00:00
|
|
|
|
|
|
|
with mock.patch('swift.cli.relinker.diskfile.relink_paths',
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
mock_relink_paths if mock_relink_paths
|
|
|
|
else default_mock_relink_paths):
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-19 20:14:20 +00:00
|
|
|
self.assertEqual(exp_ret_code, relinker.main([
|
|
|
|
command,
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
] + extra_options), [self.logger.all_log_lines()])
|
2021-03-19 20:14:20 +00:00
|
|
|
|
|
|
|
if exp_new_specs:
|
|
|
|
self.assertTrue(os.path.isdir(self.expected_dir))
|
|
|
|
exp_filenames = make_filenames(exp_new_specs)
|
|
|
|
actual_new = sorted(os.listdir(self.expected_dir))
|
|
|
|
self.assertEqual(sorted(exp_filenames), sorted(actual_new))
|
|
|
|
else:
|
|
|
|
self.assertFalse(os.path.exists(self.expected_dir))
|
|
|
|
if exp_old_specs:
|
|
|
|
exp_filenames = make_filenames(exp_old_specs)
|
|
|
|
actual_old = sorted(os.listdir(self.objdir))
|
|
|
|
self.assertEqual(sorted(exp_filenames), sorted(actual_old))
|
|
|
|
else:
|
|
|
|
self.assertFalse(os.path.exists(self.objdir))
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-19 20:14:20 +00:00
|
|
|
|
|
|
|
def _relink_test(self, old_file_specs, new_file_specs,
|
|
|
|
exp_old_specs, exp_new_specs):
|
|
|
|
# force the rehash to not happen during relink so that we can inspect
|
|
|
|
# files in the new partition hash dir before they are cleaned up
|
|
|
|
self._setup_object(lambda part: part < 2 ** (PART_POWER - 1))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
self._do_link_test('relink', old_file_specs, new_file_specs, None,
|
|
|
|
exp_old_specs, exp_new_specs)
|
2021-03-09 17:44:26 +00:00
|
|
|
|
|
|
|
def test_relink_data_file(self):
|
|
|
|
self._relink_test((('data', 0),),
|
|
|
|
None,
|
|
|
|
(('data', 0),),
|
|
|
|
(('data', 0),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(1 files, 1 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_data_meta_files(self):
|
|
|
|
self._relink_test((('data', 0), ('meta', 1)),
|
|
|
|
None,
|
|
|
|
(('data', 0), ('meta', 1)),
|
|
|
|
(('data', 0), ('meta', 1)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(2 files, 2 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_meta_file(self):
|
|
|
|
self._relink_test((('meta', 0),),
|
|
|
|
None,
|
|
|
|
(('meta', 0),),
|
|
|
|
(('meta', 0),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(1 files, 1 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_ts_file(self):
|
|
|
|
self._relink_test((('ts', 0),),
|
|
|
|
None,
|
|
|
|
(('ts', 0),),
|
|
|
|
(('ts', 0),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(1 files, 1 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_data_meta_ts_files(self):
|
|
|
|
self._relink_test((('data', 0), ('meta', 1), ('ts', 2)),
|
|
|
|
None,
|
|
|
|
(('ts', 2),),
|
|
|
|
(('ts', 2),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(1 files, 1 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_data_ts_meta_files(self):
|
|
|
|
self._relink_test((('data', 0), ('ts', 1), ('meta', 2)),
|
|
|
|
None,
|
|
|
|
(('ts', 1), ('meta', 2)),
|
|
|
|
(('ts', 1), ('meta', 2)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(2 files, 2 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_ts_data_meta_files(self):
|
|
|
|
self._relink_test((('ts', 0), ('data', 1), ('meta', 2)),
|
|
|
|
None,
|
|
|
|
(('data', 1), ('meta', 2)),
|
|
|
|
(('data', 1), ('meta', 2)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(2 files, 2 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_data_data_meta_files(self):
|
|
|
|
self._relink_test((('data', 0), ('data', 1), ('meta', 2)),
|
|
|
|
None,
|
|
|
|
(('data', 1), ('meta', 2)),
|
|
|
|
(('data', 1), ('meta', 2)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(2 files, 2 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_data_existing_meta_files(self):
|
|
|
|
self._relink_test((('data', 0), ('meta', 1)),
|
|
|
|
(('meta', 1),),
|
|
|
|
(('data', 0), ('meta', 1)),
|
|
|
|
(('data', 0), ('meta', 1)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(2 files, 1 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_data_meta_existing_newer_data_files(self):
|
|
|
|
self._relink_test((('data', 0), ('meta', 2)),
|
|
|
|
(('data', 1),),
|
|
|
|
(('data', 0), ('meta', 2)),
|
|
|
|
(('data', 1), ('meta', 2)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(1 files, 1 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
2021-03-19 12:44:02 -07:00
|
|
|
def test_relink_data_existing_older_data_files_no_cleanup(self):
|
|
|
|
self._relink_test((('data', 1),),
|
|
|
|
(('data', 0),),
|
|
|
|
(('data', 1),),
|
|
|
|
(('data', 0), ('data', 1)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(1 files, 1 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
2021-03-09 17:44:26 +00:00
|
|
|
def test_relink_data_existing_older_meta_files(self):
|
|
|
|
self._relink_test((('data', 0), ('meta', 2)),
|
|
|
|
(('meta', 1),),
|
|
|
|
(('data', 0), ('meta', 2)),
|
|
|
|
(('data', 0), ('meta', 1), ('meta', 2)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(2 files, 2 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_existing_data_meta_ts_files(self):
|
|
|
|
self._relink_test((('data', 0), ('meta', 1), ('ts', 2)),
|
|
|
|
(('data', 0),),
|
|
|
|
(('ts', 2),),
|
|
|
|
(('data', 0), ('ts', 2),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(1 files, 1 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_existing_data_meta_older_ts_files(self):
|
|
|
|
self._relink_test((('data', 1), ('meta', 2)),
|
|
|
|
(('ts', 0),),
|
|
|
|
(('data', 1), ('meta', 2)),
|
|
|
|
(('ts', 0), ('data', 1), ('meta', 2)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(2 files, 2 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_data_meta_existing_ts_files(self):
|
|
|
|
self._relink_test((('data', 0), ('meta', 1), ('ts', 2)),
|
|
|
|
(('ts', 2),),
|
|
|
|
(('ts', 2),),
|
|
|
|
(('ts', 2),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(1 files, 0 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_data_meta_existing_newer_ts_files(self):
|
|
|
|
self._relink_test((('data', 0), ('meta', 1)),
|
|
|
|
(('ts', 2),),
|
|
|
|
(('data', 0), ('meta', 1)),
|
|
|
|
(('ts', 2),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(0 files, 0 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
|
|
|
def test_relink_ts_existing_newer_data_files(self):
|
|
|
|
self._relink_test((('ts', 0),),
|
|
|
|
(('data', 2),),
|
|
|
|
(('ts', 0),),
|
|
|
|
(('data', 2),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(0 files, 0 linked, 0 removed, 0 errors)', info_lines)
|
|
|
|
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
def test_relink_conflicting_ts_file(self):
|
2021-06-30 14:05:23 +01:00
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
self._do_link_test('relink',
|
|
|
|
(('ts', 0),),
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
None,
|
|
|
|
(('ts', 0),),
|
|
|
|
(('ts', 0),),
|
2021-06-30 14:05:23 +01:00
|
|
|
(('ts', 0),),
|
|
|
|
exp_ret_code=0)
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
2021-06-30 14:05:23 +01:00
|
|
|
self.assertEqual([], warning_lines)
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(1 files, 0 linked, 0 removed, 0 errors)',
|
|
|
|
info_lines)
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
|
2021-03-09 17:44:26 +00:00
|
|
|
def test_relink_link_already_exists_but_different_inode(self):
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
|
|
|
|
# make a file where we'd expect the link to be created
|
|
|
|
os.makedirs(self.expected_dir)
|
|
|
|
with open(self.expected_file, 'w'):
|
|
|
|
pass
|
|
|
|
|
|
|
|
# expect an error
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-09 17:44:26 +00:00
|
|
|
self.assertEqual(1, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
|
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
|
|
|
self.assertIn('Error relinking: failed to relink %s to %s: '
|
|
|
|
'[Errno 17] File exists'
|
|
|
|
% (self.objname, self.expected_file),
|
|
|
|
warning_lines[0])
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
2021-01-06 16:18:04 -08:00
|
|
|
'(1 files, 0 linked, 0 removed, 1 errors)',
|
|
|
|
warning_lines)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-09 17:44:26 +00:00
|
|
|
|
2021-03-03 17:30:24 +00:00
|
|
|
def test_relink_link_already_exists(self):
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
orig_relink_paths = relink_paths
|
|
|
|
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
def mock_relink_paths(target_path, new_target_path, **kwargs):
|
2021-03-03 17:30:24 +00:00
|
|
|
# pretend another process has created the link before this one
|
|
|
|
os.makedirs(self.expected_dir)
|
|
|
|
os.link(target_path, new_target_path)
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
return orig_relink_paths(target_path, new_target_path,
|
|
|
|
**kwargs)
|
2021-03-03 17:30:24 +00:00
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-09 17:44:26 +00:00
|
|
|
with mock.patch('swift.cli.relinker.diskfile.relink_paths',
|
|
|
|
mock_relink_paths):
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
2021-03-03 17:30:24 +00:00
|
|
|
|
|
|
|
self.assertTrue(os.path.isdir(self.expected_dir))
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file))
|
|
|
|
stat_old = os.stat(os.path.join(self.objdir, self.object_fname))
|
|
|
|
stat_new = os.stat(self.expected_file)
|
|
|
|
self.assertEqual(stat_old.st_ino, stat_new.st_ino)
|
2021-03-09 17:44:26 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(1 files, 0 linked, 0 removed, 0 errors)', info_lines)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-03 17:30:24 +00:00
|
|
|
|
|
|
|
def test_relink_link_target_disappears(self):
|
|
|
|
# we need object name in lower half of current part so that there is no
|
|
|
|
# rehash of the new partition which wold erase the empty new partition
|
|
|
|
# - we want to assert it was created
|
|
|
|
self._setup_object(lambda part: part < 2 ** (PART_POWER - 1))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
orig_relink_paths = relink_paths
|
|
|
|
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
def mock_relink_paths(target_path, new_target_path, **kwargs):
|
2021-03-03 17:30:24 +00:00
|
|
|
# pretend another process has cleaned up the target path
|
|
|
|
os.unlink(target_path)
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
return orig_relink_paths(target_path, new_target_path,
|
|
|
|
**kwargs)
|
2021-03-03 17:30:24 +00:00
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-09 17:44:26 +00:00
|
|
|
with mock.patch('swift.cli.relinker.diskfile.relink_paths',
|
|
|
|
mock_relink_paths):
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
2021-03-03 17:30:24 +00:00
|
|
|
|
|
|
|
self.assertTrue(os.path.isdir(self.expected_dir))
|
|
|
|
self.assertFalse(os.path.isfile(self.expected_file))
|
2021-03-09 17:44:26 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=False) '
|
|
|
|
'(1 files, 0 linked, 0 removed, 0 errors)', info_lines)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-03 17:30:24 +00:00
|
|
|
|
2021-01-29 12:43:54 -08:00
|
|
|
def test_relink_no_applicable_policy(self):
|
|
|
|
# NB do not prepare part power increase
|
|
|
|
self._save_ring()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-01-29 12:43:54 -08:00
|
|
|
self.assertEqual(2, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
]))
|
|
|
|
self.assertEqual(self.logger.get_lines_for_level('warning'),
|
|
|
|
['No policy found to increase the partition power.'])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-01-29 12:43:54 -08:00
|
|
|
|
|
|
|
def test_relink_not_mounted(self):
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-01-29 12:43:54 -08:00
|
|
|
self.assertEqual(1, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
]))
|
|
|
|
self.assertEqual(self.logger.get_lines_for_level('warning'), [
|
|
|
|
'Skipping sda1 as it is not mounted',
|
2021-01-06 16:18:04 -08:00
|
|
|
'1 disks were unmounted',
|
|
|
|
'0 hash dirs processed (cleanup=False) '
|
|
|
|
'(0 files, 0 linked, 0 removed, 0 errors)',
|
|
|
|
])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-01-29 12:43:54 -08:00
|
|
|
|
|
|
|
def test_relink_listdir_error(self):
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-01-29 12:43:54 -08:00
|
|
|
with self._mock_listdir():
|
|
|
|
self.assertEqual(1, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount-check'
|
|
|
|
]))
|
|
|
|
self.assertEqual(self.logger.get_lines_for_level('warning'), [
|
|
|
|
'Skipping %s because ' % self.objects,
|
2021-01-06 16:18:04 -08:00
|
|
|
'There were 1 errors listing partition directories',
|
|
|
|
'0 hash dirs processed (cleanup=False) '
|
|
|
|
'(0 files, 0 linked, 0 removed, 1 errors)',
|
|
|
|
])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-01-29 12:43:54 -08:00
|
|
|
|
2019-11-14 16:26:48 -05:00
|
|
|
def test_relink_device_filter(self):
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
2021-02-01 16:40:21 -08:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
'--device', self.existing_device,
|
|
|
|
]))
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
self.assertTrue(os.path.isdir(self.expected_dir))
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file))
|
|
|
|
|
|
|
|
stat_old = os.stat(os.path.join(self.objdir, self.object_fname))
|
|
|
|
stat_new = os.stat(self.expected_file)
|
|
|
|
self.assertEqual(stat_old.st_ino, stat_new.st_ino)
|
|
|
|
|
|
|
|
def test_relink_device_filter_invalid(self):
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
2021-02-01 16:40:21 -08:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
'--device', 'none',
|
|
|
|
]))
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
self.assertFalse(os.path.isdir(self.expected_dir))
|
|
|
|
self.assertFalse(os.path.isfile(self.expected_file))
|
|
|
|
|
2021-03-15 12:07:10 +00:00
|
|
|
def test_relink_partition_filter(self):
|
|
|
|
# ensure partitions are in second quartile so that new partitions are
|
|
|
|
# not included in the relinked partitions when the relinker is re-run:
|
|
|
|
# this makes the number of partitions visited predictable (i.e. 3)
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
# create some other test files in different partitions
|
|
|
|
other_objs = []
|
|
|
|
used_parts = [self.part, self.part + 1]
|
|
|
|
for i in range(2):
|
|
|
|
_hash, part, next_part, obj = self._get_object_name(
|
|
|
|
lambda part:
|
|
|
|
part >= 2 ** (PART_POWER - 1) and part not in used_parts)
|
|
|
|
obj_dir = os.path.join(self.objects, str(part), _hash[-3:], _hash)
|
|
|
|
os.makedirs(obj_dir)
|
|
|
|
obj_file = os.path.join(obj_dir, self.object_fname)
|
|
|
|
with open(obj_file, 'w'):
|
|
|
|
pass
|
|
|
|
other_objs.append((part, obj_file))
|
|
|
|
used_parts.append(part)
|
|
|
|
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
|
|
|
|
# invalid partition
|
|
|
|
with mock.patch('sys.stdout'), mock.patch('sys.stderr'):
|
|
|
|
with self.assertRaises(SystemExit) as cm:
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
2021-01-06 16:18:04 -08:00
|
|
|
'--partition', '-1',
|
2021-03-15 12:07:10 +00:00
|
|
|
]))
|
|
|
|
self.assertEqual(2, cm.exception.code)
|
|
|
|
|
|
|
|
with mock.patch('sys.stdout'), mock.patch('sys.stderr'):
|
|
|
|
with self.assertRaises(SystemExit) as cm:
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
2021-01-06 16:18:04 -08:00
|
|
|
'--partition', 'abc',
|
2021-03-15 12:07:10 +00:00
|
|
|
]))
|
|
|
|
self.assertEqual(2, cm.exception.code)
|
|
|
|
|
|
|
|
# restrict to a partition with no test object
|
|
|
|
self.logger.clear()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-15 12:07:10 +00:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
2021-01-06 16:18:04 -08:00
|
|
|
'--partition', str(self.part + 1),
|
2021-03-15 12:07:10 +00:00
|
|
|
]))
|
|
|
|
self.assertFalse(os.path.isdir(self.expected_dir))
|
2021-04-12 17:08:43 +01:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertEqual(4, len(info_lines))
|
|
|
|
self.assertIn('Starting relinker (cleanup=False) using 1 workers:',
|
|
|
|
info_lines[0])
|
2021-03-15 12:07:10 +00:00
|
|
|
self.assertEqual(
|
|
|
|
['Processing files for policy platinum under %s (cleanup=False)'
|
2021-01-06 16:18:04 -08:00
|
|
|
% os.path.join(self.devices, 'sda1'),
|
2021-03-15 12:07:10 +00:00
|
|
|
'0 hash dirs processed (cleanup=False) (0 files, 0 linked, '
|
2021-04-12 17:08:43 +01:00
|
|
|
'0 removed, 0 errors)'], info_lines[1:3]
|
2021-03-15 12:07:10 +00:00
|
|
|
)
|
2021-04-12 17:08:43 +01:00
|
|
|
self.assertIn('Finished relinker (cleanup=False):',
|
|
|
|
info_lines[3])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-15 12:07:10 +00:00
|
|
|
|
|
|
|
# restrict to one partition with a test object
|
|
|
|
self.logger.clear()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-15 12:07:10 +00:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
2021-01-06 16:18:04 -08:00
|
|
|
'--partition', str(self.part),
|
2021-03-15 12:07:10 +00:00
|
|
|
]))
|
|
|
|
self.assertTrue(os.path.isdir(self.expected_dir))
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file))
|
|
|
|
stat_old = os.stat(os.path.join(self.objdir, self.object_fname))
|
|
|
|
stat_new = os.stat(self.expected_file)
|
|
|
|
self.assertEqual(stat_old.st_ino, stat_new.st_ino)
|
2021-04-12 17:08:43 +01:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertEqual(5, len(info_lines))
|
|
|
|
self.assertIn('Starting relinker (cleanup=False) using 1 workers:',
|
|
|
|
info_lines[0])
|
2021-03-15 12:07:10 +00:00
|
|
|
self.assertEqual(
|
|
|
|
['Processing files for policy platinum under %s (cleanup=False)'
|
2021-01-06 16:18:04 -08:00
|
|
|
% os.path.join(self.devices, 'sda1'),
|
2021-03-15 09:58:27 +11:00
|
|
|
'Step: relink Device: sda1 Policy: platinum Partitions: 1/3',
|
2021-03-15 12:07:10 +00:00
|
|
|
'1 hash dirs processed (cleanup=False) (1 files, 1 linked, '
|
2021-04-12 17:08:43 +01:00
|
|
|
'0 removed, 0 errors)'], info_lines[1:4]
|
2021-03-15 12:07:10 +00:00
|
|
|
)
|
2021-04-12 17:08:43 +01:00
|
|
|
self.assertIn('Finished relinker (cleanup=False):',
|
|
|
|
info_lines[4])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-15 12:07:10 +00:00
|
|
|
|
|
|
|
# restrict to two partitions with test objects
|
|
|
|
self.logger.clear()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-15 12:07:10 +00:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
'--partition', str(other_objs[0][0]),
|
|
|
|
'-p', str(other_objs[0][0]), # duplicates should be ignored
|
2021-01-06 16:18:04 -08:00
|
|
|
'-p', str(other_objs[1][0]),
|
2021-03-15 12:07:10 +00:00
|
|
|
]))
|
|
|
|
expected_file = utils.replace_partition_in_path(
|
|
|
|
self.devices, other_objs[0][1], PART_POWER + 1)
|
|
|
|
self.assertTrue(os.path.isfile(expected_file))
|
|
|
|
stat_old = os.stat(other_objs[0][1])
|
|
|
|
stat_new = os.stat(expected_file)
|
|
|
|
self.assertEqual(stat_old.st_ino, stat_new.st_ino)
|
|
|
|
expected_file = utils.replace_partition_in_path(
|
|
|
|
self.devices, other_objs[1][1], PART_POWER + 1)
|
|
|
|
self.assertTrue(os.path.isfile(expected_file))
|
|
|
|
stat_old = os.stat(other_objs[1][1])
|
|
|
|
stat_new = os.stat(expected_file)
|
|
|
|
self.assertEqual(stat_old.st_ino, stat_new.st_ino)
|
2021-04-12 17:08:43 +01:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertEqual(6, len(info_lines))
|
|
|
|
self.assertIn('Starting relinker (cleanup=False) using 1 workers:',
|
|
|
|
info_lines[0])
|
2021-03-15 12:07:10 +00:00
|
|
|
self.assertEqual(
|
|
|
|
['Processing files for policy platinum under %s (cleanup=False)'
|
2021-01-06 16:18:04 -08:00
|
|
|
% os.path.join(self.devices, 'sda1'),
|
2021-03-15 09:58:27 +11:00
|
|
|
'Step: relink Device: sda1 Policy: platinum Partitions: 2/3',
|
|
|
|
'Step: relink Device: sda1 Policy: platinum Partitions: 3/3',
|
2021-03-15 12:07:10 +00:00
|
|
|
'2 hash dirs processed (cleanup=False) (2 files, 2 linked, '
|
2021-04-12 17:08:43 +01:00
|
|
|
'0 removed, 0 errors)'], info_lines[1:5]
|
2021-03-15 12:07:10 +00:00
|
|
|
)
|
2021-04-12 17:08:43 +01:00
|
|
|
self.assertIn('Finished relinker (cleanup=False):',
|
|
|
|
info_lines[5])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-15 12:07:10 +00:00
|
|
|
|
2021-03-10 15:19:15 +00:00
|
|
|
@patch_policies(
|
|
|
|
[StoragePolicy(0, name='gold', is_default=True),
|
|
|
|
ECStoragePolicy(1, name='platinum', ec_type=DEFAULT_TEST_EC_TYPE,
|
|
|
|
ec_ndata=4, ec_nparity=2)])
|
|
|
|
def test_relink_policy_option(self):
|
|
|
|
self._setup_object()
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
|
|
|
|
# invalid policy
|
|
|
|
with mock.patch('sys.stdout'), mock.patch('sys.stderr'):
|
|
|
|
with self.assertRaises(SystemExit) as cm:
|
2021-03-19 10:24:59 -07:00
|
|
|
relinker.main([
|
2021-03-10 15:19:15 +00:00
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--policy', '9',
|
|
|
|
'--skip-mount',
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--device', self.existing_device,
|
2021-03-19 10:24:59 -07:00
|
|
|
])
|
2021-03-10 15:19:15 +00:00
|
|
|
self.assertEqual(2, cm.exception.code)
|
|
|
|
|
|
|
|
with mock.patch('sys.stdout'), mock.patch('sys.stderr'):
|
|
|
|
with self.assertRaises(SystemExit) as cm:
|
2021-03-19 10:24:59 -07:00
|
|
|
relinker.main([
|
2021-03-10 15:19:15 +00:00
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
2021-03-19 10:24:59 -07:00
|
|
|
'--policy', 'pewter',
|
2021-03-10 15:19:15 +00:00
|
|
|
'--skip-mount',
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--device', self.existing_device,
|
2021-03-19 10:24:59 -07:00
|
|
|
])
|
2021-03-10 15:19:15 +00:00
|
|
|
self.assertEqual(2, cm.exception.code)
|
|
|
|
|
|
|
|
# policy with no object
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-10 15:19:15 +00:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--policy', '1',
|
|
|
|
'--skip-mount',
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--device', self.existing_device,
|
|
|
|
]))
|
|
|
|
self.assertFalse(os.path.isdir(self.expected_dir))
|
2021-04-12 17:08:43 +01:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertEqual(4, len(info_lines))
|
|
|
|
self.assertIn('Starting relinker (cleanup=False) using 1 workers:',
|
|
|
|
info_lines[0])
|
2021-03-10 15:19:15 +00:00
|
|
|
self.assertEqual(
|
|
|
|
['Processing files for policy platinum under %s/%s (cleanup=False)'
|
|
|
|
% (self.devices, self.existing_device),
|
2021-03-09 17:44:26 +00:00
|
|
|
'0 hash dirs processed (cleanup=False) (0 files, 0 linked, '
|
2021-04-12 17:08:43 +01:00
|
|
|
'0 removed, 0 errors)'], info_lines[1:3]
|
|
|
|
)
|
|
|
|
self.assertIn('Finished relinker (cleanup=False):',
|
|
|
|
info_lines[3])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-10 15:19:15 +00:00
|
|
|
|
|
|
|
# policy with object
|
|
|
|
self.logger.clear()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-10 15:19:15 +00:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--policy', '0',
|
|
|
|
'--skip-mount',
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--device', self.existing_device,
|
|
|
|
]))
|
|
|
|
self.assertTrue(os.path.isdir(self.expected_dir))
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file))
|
|
|
|
stat_old = os.stat(os.path.join(self.objdir, self.object_fname))
|
|
|
|
stat_new = os.stat(self.expected_file)
|
|
|
|
self.assertEqual(stat_old.st_ino, stat_new.st_ino)
|
2021-04-12 17:08:43 +01:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertEqual(5, len(info_lines))
|
|
|
|
self.assertIn('Starting relinker (cleanup=False) using 1 workers:',
|
|
|
|
info_lines[0])
|
2021-03-10 15:19:15 +00:00
|
|
|
self.assertEqual(
|
|
|
|
['Processing files for policy gold under %s/%s (cleanup=False)'
|
|
|
|
% (self.devices, self.existing_device),
|
2021-03-15 09:58:27 +11:00
|
|
|
'Step: relink Device: sda1 Policy: gold Partitions: 1/1',
|
2021-03-09 17:44:26 +00:00
|
|
|
'1 hash dirs processed (cleanup=False) (1 files, 1 linked, '
|
2021-04-12 17:08:43 +01:00
|
|
|
'0 removed, 0 errors)'], info_lines[1:4]
|
|
|
|
)
|
|
|
|
self.assertIn('Finished relinker (cleanup=False):',
|
|
|
|
info_lines[4])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-10 15:19:15 +00:00
|
|
|
|
2021-03-19 10:24:59 -07:00
|
|
|
# policy name works, too
|
|
|
|
self.logger.clear()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-19 10:24:59 -07:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--policy', 'gold',
|
|
|
|
'--skip-mount',
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--device', self.existing_device,
|
|
|
|
]))
|
|
|
|
self.assertTrue(os.path.isdir(self.expected_dir))
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file))
|
|
|
|
stat_old = os.stat(os.path.join(self.objdir, self.object_fname))
|
|
|
|
stat_new = os.stat(self.expected_file)
|
|
|
|
self.assertEqual(stat_old.st_ino, stat_new.st_ino)
|
2021-04-12 17:08:43 +01:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertEqual(4, len(info_lines))
|
|
|
|
self.assertIn('Starting relinker (cleanup=False) using 1 workers:',
|
|
|
|
info_lines[0])
|
2021-03-19 10:24:59 -07:00
|
|
|
self.assertEqual(
|
|
|
|
['Processing files for policy gold under %s/%s (cleanup=False)'
|
|
|
|
% (self.devices, self.existing_device),
|
|
|
|
'0 hash dirs processed (cleanup=False) '
|
2021-04-12 17:08:43 +01:00
|
|
|
'(0 files, 0 linked, 0 removed, 0 errors)'], info_lines[1:3]
|
|
|
|
)
|
|
|
|
self.assertIn('Finished relinker (cleanup=False):',
|
|
|
|
info_lines[3])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-19 10:24:59 -07:00
|
|
|
|
2021-03-09 16:17:41 +00:00
|
|
|
@patch_policies(
|
|
|
|
[StoragePolicy(0, name='gold', is_default=True),
|
|
|
|
ECStoragePolicy(1, name='platinum', ec_type=DEFAULT_TEST_EC_TYPE,
|
|
|
|
ec_ndata=4, ec_nparity=2)])
|
|
|
|
def test_relink_all_policies(self):
|
|
|
|
# verify that only policies in appropriate state are processed
|
2021-03-10 15:19:15 +00:00
|
|
|
def do_relink(options=None):
|
|
|
|
options = [] if options is None else options
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
|
|
|
with mock.patch(
|
|
|
|
'swift.cli.relinker.Relinker.process_policy') \
|
|
|
|
as mocked:
|
|
|
|
res = relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--skip-mount',
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--device', self.existing_device,
|
|
|
|
] + options)
|
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
|
|
|
return res, mocked
|
2021-03-09 16:17:41 +00:00
|
|
|
|
|
|
|
self._save_ring(POLICIES) # no ring prepared for increase
|
|
|
|
res, mocked = do_relink()
|
|
|
|
self.assertEqual([], mocked.call_args_list)
|
|
|
|
self.assertEqual(2, res)
|
|
|
|
|
|
|
|
self._save_ring([POLICIES[0]]) # not prepared for increase
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring([POLICIES[1]]) # prepared for increase
|
|
|
|
res, mocked = do_relink()
|
|
|
|
self.assertEqual([mock.call(POLICIES[1])], mocked.call_args_list)
|
|
|
|
self.assertEqual(0, res)
|
|
|
|
|
2021-03-19 10:24:59 -07:00
|
|
|
res, mocked = do_relink(['--policy', '0'])
|
2021-03-10 15:19:15 +00:00
|
|
|
self.assertEqual([], mocked.call_args_list)
|
|
|
|
self.assertEqual(2, res)
|
|
|
|
|
2021-03-09 16:17:41 +00:00
|
|
|
self._save_ring([POLICIES[0]]) # prepared for increase
|
|
|
|
res, mocked = do_relink()
|
|
|
|
self.assertEqual([mock.call(POLICIES[0]), mock.call(POLICIES[1])],
|
|
|
|
mocked.call_args_list)
|
|
|
|
self.assertEqual(0, res)
|
|
|
|
|
|
|
|
self.rb.increase_partition_power()
|
|
|
|
self._save_ring([POLICIES[0]]) # increased
|
|
|
|
res, mocked = do_relink()
|
|
|
|
self.assertEqual([mock.call(POLICIES[1])], mocked.call_args_list)
|
|
|
|
self.assertEqual(0, res)
|
|
|
|
|
|
|
|
self._save_ring([POLICIES[1]]) # increased
|
|
|
|
res, mocked = do_relink()
|
|
|
|
self.assertEqual([], mocked.call_args_list)
|
|
|
|
self.assertEqual(2, res)
|
|
|
|
|
2021-03-10 15:19:15 +00:00
|
|
|
res, mocked = do_relink(['--policy', '0'])
|
|
|
|
self.assertEqual([], mocked.call_args_list)
|
|
|
|
self.assertEqual(2, res)
|
|
|
|
|
2021-03-09 16:17:41 +00:00
|
|
|
self.rb.finish_increase_partition_power()
|
|
|
|
self._save_ring(POLICIES) # all rings finished
|
|
|
|
res, mocked = do_relink()
|
|
|
|
self.assertEqual([], mocked.call_args_list)
|
|
|
|
self.assertEqual(2, res)
|
|
|
|
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
def test_relink_conflicting_ts_is_linked_to_part_power(self):
|
|
|
|
# link from next partition to current partition;
|
|
|
|
# different file in current-1 partition
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
filename = '.'.join([self.obj_ts.internal, 'ts'])
|
|
|
|
new_filepath = os.path.join(self.expected_dir, filename)
|
|
|
|
old_filepath = os.path.join(self.objdir, filename)
|
|
|
|
# setup a file in the current-1 part power (PART_POWER - 1) location
|
2021-06-30 14:05:23 +01:00
|
|
|
# that is *not* linked to the file in the next part power location
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
older_filepath = utils.replace_partition_in_path(
|
|
|
|
self.devices, new_filepath, PART_POWER - 1)
|
|
|
|
os.makedirs(os.path.dirname(older_filepath))
|
|
|
|
with open(older_filepath, 'w') as fd:
|
|
|
|
fd.write(older_filepath)
|
|
|
|
self._do_link_test('relink',
|
|
|
|
(('ts', 0),),
|
|
|
|
(('ts', 0),),
|
|
|
|
None,
|
|
|
|
(('ts', 0),),
|
|
|
|
(('ts', 0),),
|
|
|
|
exp_ret_code=0)
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
# both the PART_POWER and PART_POWER - N partitions are visited, no new
|
2021-06-30 14:05:23 +01:00
|
|
|
# links are created, and both the older files are retained
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
self.assertIn('2 hash dirs processed (cleanup=False) '
|
2021-06-30 14:05:23 +01:00
|
|
|
'(2 files, 0 linked, 0 removed, 0 errors)',
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
info_lines)
|
|
|
|
with open(new_filepath, 'r') as fd:
|
|
|
|
self.assertEqual(old_filepath, fd.read())
|
2021-06-30 14:05:23 +01:00
|
|
|
self.assertTrue(os.path.exists(older_filepath))
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
|
|
|
|
def test_relink_conflicting_ts_is_linked_to_part_power_minus_1(self):
|
|
|
|
# link from next partition to current-1 partition;
|
|
|
|
# different file in current partition
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
# setup a file in the next part power (PART_POWER + 1) location that is
|
|
|
|
# linked to a file in an older (PART_POWER - 1) location
|
|
|
|
filename = '.'.join([self.obj_ts.internal, 'ts'])
|
|
|
|
older_filepath, new_filepath = self._make_link(filename,
|
|
|
|
PART_POWER - 1)
|
|
|
|
self._do_link_test('relink',
|
|
|
|
(('ts', 0),),
|
|
|
|
None,
|
|
|
|
None, # we already made file linked to older part
|
|
|
|
(('ts', 0),), # retained
|
|
|
|
(('ts', 0),),
|
|
|
|
exp_ret_code=0)
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
# both the PART_POWER and PART_POWER - N partitions are visited, no new
|
|
|
|
# links are created, and both the older files are retained
|
|
|
|
self.assertIn('2 hash dirs processed (cleanup=False) '
|
|
|
|
'(2 files, 0 linked, 0 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
with open(new_filepath, 'r') as fd:
|
|
|
|
self.assertEqual(older_filepath, fd.read())
|
|
|
|
# prev part power file is retained because it is link target
|
|
|
|
self.assertTrue(os.path.exists(older_filepath))
|
|
|
|
|
|
|
|
def test_relink_conflicting_ts_is_linked_to_part_power_minus_2_err(self):
|
|
|
|
# link from next partition to current-2 partition;
|
|
|
|
# different file in current partition
|
|
|
|
# by default the relinker will NOT validate the current-2 location
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
# setup a file in the next part power (PART_POWER + 1) location that is
|
|
|
|
# linked to a file in an older (PART_POWER - 2) location
|
|
|
|
filename = '.'.join([self.obj_ts.internal, 'ts'])
|
|
|
|
older_filepath, new_filepath = self._make_link(filename,
|
|
|
|
PART_POWER - 2)
|
|
|
|
|
|
|
|
self._do_link_test('relink',
|
|
|
|
(('ts', 0),),
|
|
|
|
None,
|
|
|
|
None, # we already made file linked to older part
|
|
|
|
(('ts', 0),), # retained
|
|
|
|
(('ts', 0),),
|
2021-06-30 14:05:23 +01:00
|
|
|
exp_ret_code=0)
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
2021-06-30 14:05:23 +01:00
|
|
|
self.assertEqual([], warning_lines)
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('2 hash dirs processed (cleanup=False) '
|
|
|
|
'(2 files, 0 linked, 0 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
with open(new_filepath, 'r') as fd:
|
|
|
|
self.assertEqual(older_filepath, fd.read())
|
2021-06-30 14:05:23 +01:00
|
|
|
# prev-1 part power file is always retained because it is link target
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
self.assertTrue(os.path.exists(older_filepath))
|
|
|
|
|
|
|
|
def test_relink_conflicting_ts_both_in_older_part_powers(self):
|
|
|
|
# link from next partition to current-1 partition;
|
|
|
|
# different file in current partition
|
|
|
|
# different file in current-2 location
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 2))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
# setup a file in the next part power (PART_POWER + 1) location that is
|
|
|
|
# linked to a file in an older (PART_POWER - 1) location
|
|
|
|
filename = '.'.join([self.obj_ts.internal, 'ts'])
|
|
|
|
older_filepath, new_filepath = self._make_link(filename,
|
|
|
|
PART_POWER - 1)
|
|
|
|
# setup a file in an even older part power (PART_POWER - 2) location
|
2021-06-30 14:05:23 +01:00
|
|
|
# that is *not* linked to the file in the next part power location
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
oldest_filepath = utils.replace_partition_in_path(
|
|
|
|
self.devices, new_filepath, PART_POWER - 2)
|
|
|
|
os.makedirs(os.path.dirname(oldest_filepath))
|
|
|
|
with open(oldest_filepath, 'w') as fd:
|
|
|
|
fd.write(oldest_filepath)
|
|
|
|
|
|
|
|
self._do_link_test('relink',
|
|
|
|
(('ts', 0),),
|
|
|
|
None,
|
|
|
|
None, # we already made file linked to older part
|
|
|
|
(('ts', 0),), # retained
|
|
|
|
(('ts', 0),),
|
|
|
|
exp_ret_code=0)
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
# both the PART_POWER and PART_POWER - N partitions are visited, no new
|
|
|
|
# links are created, and both the older files are retained
|
|
|
|
self.assertIn('3 hash dirs processed (cleanup=False) '
|
2021-06-30 14:05:23 +01:00
|
|
|
'(3 files, 0 linked, 0 removed, 0 errors)',
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
info_lines)
|
|
|
|
with open(new_filepath, 'r') as fd:
|
|
|
|
self.assertEqual(older_filepath, fd.read())
|
|
|
|
self.assertTrue(os.path.exists(older_filepath)) # linked so retained
|
2021-06-30 14:05:23 +01:00
|
|
|
self.assertTrue(os.path.exists(oldest_filepath)) # retained anyway
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
|
2021-03-09 16:17:41 +00:00
|
|
|
@patch_policies(
|
|
|
|
[StoragePolicy(0, name='gold', is_default=True),
|
|
|
|
ECStoragePolicy(1, name='platinum', ec_type=DEFAULT_TEST_EC_TYPE,
|
|
|
|
ec_ndata=4, ec_nparity=2)])
|
|
|
|
def test_cleanup_all_policies(self):
|
|
|
|
# verify that only policies in appropriate state are processed
|
2021-03-10 15:19:15 +00:00
|
|
|
def do_cleanup(options=None):
|
|
|
|
options = [] if options is None else options
|
2021-03-09 16:17:41 +00:00
|
|
|
with mock.patch(
|
|
|
|
'swift.cli.relinker.Relinker.process_policy') as mocked:
|
|
|
|
res = relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--skip-mount',
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--device', self.existing_device,
|
2021-03-10 15:19:15 +00:00
|
|
|
] + options)
|
2021-03-09 16:17:41 +00:00
|
|
|
return res, mocked
|
|
|
|
|
|
|
|
self._save_ring(POLICIES) # no ring prepared for increase
|
|
|
|
res, mocked = do_cleanup()
|
|
|
|
self.assertEqual([], mocked.call_args_list)
|
|
|
|
self.assertEqual(2, res)
|
|
|
|
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring(POLICIES) # all rings prepared for increase
|
|
|
|
res, mocked = do_cleanup()
|
|
|
|
self.assertEqual([], mocked.call_args_list)
|
|
|
|
self.assertEqual(2, res)
|
|
|
|
|
|
|
|
self.rb.increase_partition_power()
|
|
|
|
self._save_ring([POLICIES[0]]) # increased
|
|
|
|
res, mocked = do_cleanup()
|
|
|
|
self.assertEqual([mock.call(POLICIES[0])], mocked.call_args_list)
|
|
|
|
self.assertEqual(0, res)
|
|
|
|
|
2021-03-10 15:19:15 +00:00
|
|
|
res, mocked = do_cleanup(['--policy', '1'])
|
|
|
|
self.assertEqual([], mocked.call_args_list)
|
|
|
|
self.assertEqual(2, res)
|
|
|
|
|
2021-03-09 16:17:41 +00:00
|
|
|
self._save_ring([POLICIES[1]]) # increased
|
|
|
|
res, mocked = do_cleanup()
|
|
|
|
self.assertEqual([mock.call(POLICIES[0]), mock.call(POLICIES[1])],
|
|
|
|
mocked.call_args_list)
|
|
|
|
self.assertEqual(0, res)
|
|
|
|
|
|
|
|
self.rb.finish_increase_partition_power()
|
|
|
|
self._save_ring([POLICIES[1]]) # finished
|
|
|
|
res, mocked = do_cleanup()
|
|
|
|
self.assertEqual([mock.call(POLICIES[0])], mocked.call_args_list)
|
|
|
|
self.assertEqual(0, res)
|
|
|
|
|
|
|
|
self._save_ring([POLICIES[0]]) # finished
|
|
|
|
res, mocked = do_cleanup()
|
|
|
|
self.assertEqual([], mocked.call_args_list)
|
|
|
|
self.assertEqual(2, res)
|
|
|
|
|
2021-03-10 15:19:15 +00:00
|
|
|
res, mocked = do_cleanup(['--policy', '1'])
|
|
|
|
self.assertEqual([], mocked.call_args_list)
|
|
|
|
self.assertEqual(2, res)
|
|
|
|
|
2016-07-04 18:21:54 +02:00
|
|
|
def _common_test_cleanup(self, relink=True):
|
|
|
|
# Create a ring that has prev_part_power set
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
|
|
|
|
if relink:
|
2021-02-13 09:07:43 -06:00
|
|
|
conf = {'swift_dir': self.testdir,
|
|
|
|
'devices': self.devices,
|
|
|
|
'mount_check': False,
|
2021-03-10 15:19:15 +00:00
|
|
|
'files_per_second': 0,
|
2021-01-06 16:18:04 -08:00
|
|
|
'policies': POLICIES,
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
'recon_cache_path': self.recon_cache_path,
|
2021-01-06 16:18:04 -08:00
|
|
|
'workers': 0}
|
|
|
|
self.assertEqual(0, relinker.Relinker(
|
|
|
|
conf, logger=self.logger, device_list=[self.existing_device],
|
|
|
|
do_cleanup=False).run())
|
2021-02-13 09:07:43 -06:00
|
|
|
self.rb.increase_partition_power()
|
|
|
|
self._save_ring()
|
2016-07-04 18:21:54 +02:00
|
|
|
|
2021-03-19 20:14:20 +00:00
|
|
|
def _cleanup_test(self, old_file_specs, new_file_specs,
|
|
|
|
conflict_file_specs, exp_old_specs, exp_new_specs,
|
|
|
|
exp_ret_code=0, relink_errors=None):
|
|
|
|
# force the new partitions to be greater than the median so that they
|
|
|
|
# are not rehashed during cleanup, meaning we can inspect the outcome
|
|
|
|
# of the cleanup relinks and removes
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self.rb.increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
self._do_link_test('cleanup', old_file_specs, new_file_specs,
|
|
|
|
conflict_file_specs, exp_old_specs, exp_new_specs,
|
|
|
|
exp_ret_code, relink_errors)
|
|
|
|
|
|
|
|
def test_cleanup_data_meta_files(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1)),
|
|
|
|
(('data', 0), ('meta', 1)),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('data', 0), ('meta', 1)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 0 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_data_file(self):
|
|
|
|
self._cleanup_test((('data', 0),),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('data', 0),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 1 linked, 1 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_data_missing_meta_files(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1)),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('data', 0), ('meta', 1)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 2 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_meta_file(self):
|
|
|
|
self._cleanup_test((('meta', 0),),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('meta', 0),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 1 linked, 1 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_ts_file(self):
|
|
|
|
self._cleanup_test((('ts', 0),),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('ts', 0),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 1 linked, 1 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_data_missing_meta_missing_ts_files(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1), ('ts', 2)),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('ts', 2),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 1 linked, 1 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_data_missing_ts_missing_meta_files(self):
|
|
|
|
self._cleanup_test((('data', 0), ('ts', 1), ('meta', 2)),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('ts', 1), ('meta', 2)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 2 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_ts_missing_data_missing_meta_files(self):
|
|
|
|
self._cleanup_test((('ts', 0), ('data', 1), ('meta', 2)),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('data', 1), ('meta', 2)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 2 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_data_missing_data_missing_meta_files(self):
|
|
|
|
self._cleanup_test((('data', 0), ('data', 1), ('meta', 2)),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('data', 1), ('meta', 2)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 2 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_data_existing_meta_files(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1)),
|
|
|
|
(('meta', 1),),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('data', 0), ('meta', 1)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 1 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_meta_existing_newer_data_files(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 2)),
|
|
|
|
(('data', 1),),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('data', 1), ('meta', 2)))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 1 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_data_missing_meta_existing_older_meta_files(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 2)),
|
|
|
|
(('meta', 1),),
|
|
|
|
None,
|
|
|
|
None,
|
2021-03-29 11:19:18 -07:00
|
|
|
(('data', 0), ('meta', 2)))
|
2021-03-19 20:14:20 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 2 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_meta_missing_ts_files(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1), ('ts', 2)),
|
|
|
|
(('data', 0),),
|
|
|
|
None,
|
|
|
|
None,
|
2021-03-29 11:19:18 -07:00
|
|
|
(('ts', 2),))
|
2021-03-19 20:14:20 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 1 linked, 1 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_data_missing_meta_existing_older_ts_files(self):
|
|
|
|
self._cleanup_test((('data', 1), ('meta', 2)),
|
|
|
|
(('ts', 0),),
|
|
|
|
None,
|
|
|
|
None,
|
2021-03-29 11:19:18 -07:00
|
|
|
(('data', 1), ('meta', 2)))
|
2021-03-19 20:14:20 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 2 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_data_meta_existing_ts_files(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1), ('ts', 2)),
|
|
|
|
(('ts', 2),),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('ts', 2),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 0 linked, 1 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_data_meta_existing_newer_ts_files(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1)),
|
|
|
|
(('ts', 2),),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('ts', 2),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(0 files, 0 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_ts_existing_newer_data_files(self):
|
|
|
|
self._cleanup_test((('ts', 0),),
|
|
|
|
(('data', 2),),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('data', 2),))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(0 files, 0 linked, 1 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_missing_data_file_relink_fails(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1)),
|
|
|
|
(('meta', 1),),
|
|
|
|
None,
|
|
|
|
(('data', 0), ('meta', 1)), # nothing is removed
|
|
|
|
(('meta', 1),),
|
|
|
|
exp_ret_code=1,
|
|
|
|
relink_errors={'data': OSError(errno.EPERM, 'oops')}
|
|
|
|
)
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
2021-03-19 20:14:20 +00:00
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 0 linked, 0 removed, 1 errors)',
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines)
|
2021-03-19 20:14:20 +00:00
|
|
|
|
|
|
|
def test_cleanup_missing_meta_file_relink_fails(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1)),
|
|
|
|
(('data', 0),),
|
|
|
|
None,
|
|
|
|
(('data', 0), ('meta', 1)), # nothing is removed
|
|
|
|
(('data', 0),),
|
|
|
|
exp_ret_code=1,
|
|
|
|
relink_errors={'meta': OSError(errno.EPERM, 'oops')}
|
|
|
|
)
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
2021-03-19 20:14:20 +00:00
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 0 linked, 0 removed, 1 errors)',
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines)
|
2021-03-19 20:14:20 +00:00
|
|
|
|
|
|
|
def test_cleanup_missing_data_and_meta_file_one_relink_fails(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1)),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('data', 0), ('meta', 1)), # nothing is removed
|
|
|
|
(('data', 0),),
|
|
|
|
exp_ret_code=1,
|
|
|
|
relink_errors={'meta': OSError(errno.EPERM, 'oops')}
|
|
|
|
)
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
2021-03-19 20:14:20 +00:00
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 1 linked, 0 removed, 1 errors)',
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines)
|
2021-03-19 20:14:20 +00:00
|
|
|
|
|
|
|
def test_cleanup_missing_data_and_meta_file_both_relinks_fails(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1)),
|
|
|
|
None,
|
|
|
|
None,
|
|
|
|
(('data', 0), ('meta', 1)), # nothing is removed
|
|
|
|
None,
|
|
|
|
exp_ret_code=1,
|
|
|
|
relink_errors={'data': OSError(errno.EPERM, 'oops'),
|
|
|
|
'meta': OSError(errno.EPERM, 'oops')}
|
|
|
|
)
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
2021-03-19 20:14:20 +00:00
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 0 linked, 0 removed, 2 errors)',
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines)
|
2021-03-19 20:14:20 +00:00
|
|
|
|
|
|
|
def test_cleanup_conflicting_data_file(self):
|
|
|
|
self._cleanup_test((('data', 0),),
|
|
|
|
None,
|
|
|
|
(('data', 0),), # different inode
|
|
|
|
(('data', 0),),
|
|
|
|
(('data', 0),),
|
|
|
|
exp_ret_code=1)
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
2021-03-19 20:14:20 +00:00
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 0 linked, 0 removed, 1 errors)',
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines)
|
2021-03-19 20:14:20 +00:00
|
|
|
|
|
|
|
def test_cleanup_conflicting_ts_file(self):
|
|
|
|
self._cleanup_test((('ts', 0),),
|
|
|
|
None,
|
2021-06-30 14:05:23 +01:00
|
|
|
(('ts', 0),), # different inode but same timestamp
|
|
|
|
None,
|
2021-03-19 20:14:20 +00:00
|
|
|
(('ts', 0),),
|
2021-06-30 14:05:23 +01:00
|
|
|
exp_ret_code=0)
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
2021-03-19 20:14:20 +00:00
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
2021-06-30 14:05:23 +01:00
|
|
|
'(1 files, 0 linked, 1 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
|
|
|
self.assertEqual([], warning_lines)
|
2021-03-19 20:14:20 +00:00
|
|
|
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
def test_cleanup_conflicting_ts_is_linked_to_part_power_minus_1(self):
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self.rb.increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
# setup a file in the next part power (PART_POWER + 1) location that is
|
|
|
|
# linked to a file in an older PART_POWER - 1 location
|
|
|
|
filename = '.'.join([self.obj_ts.internal, 'ts'])
|
|
|
|
older_filepath, new_filepath = self._make_link(filename,
|
|
|
|
PART_POWER - 1)
|
|
|
|
self._do_link_test('cleanup',
|
|
|
|
(('ts', 0),),
|
|
|
|
None,
|
|
|
|
None, # we already made file linked to older part
|
|
|
|
None,
|
|
|
|
(('ts', 0),),
|
|
|
|
exp_ret_code=0)
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
# both the PART_POWER and PART_POWER - N partitions are visited, no new
|
|
|
|
# links are created, and both the older files are removed
|
|
|
|
self.assertIn('2 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 0 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
with open(new_filepath, 'r') as fd:
|
|
|
|
self.assertEqual(older_filepath, fd.read())
|
|
|
|
self.assertFalse(os.path.exists(older_filepath))
|
|
|
|
|
|
|
|
def test_cleanup_conflicting_ts_is_linked_to_part_power_minus_2_err(self):
|
|
|
|
# link from next partition to current-2 partition;
|
|
|
|
# different file in current partition
|
|
|
|
# by default the relinker will NOT validate the current-2 location
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self.rb.increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
# setup a file in the next part power (PART_POWER + 1) location that is
|
|
|
|
# linked to a file in an older (PART_POWER - 2) location
|
|
|
|
filename = '.'.join([self.obj_ts.internal, 'ts'])
|
|
|
|
older_filepath, new_filepath = self._make_link(filename,
|
|
|
|
PART_POWER - 2)
|
|
|
|
|
|
|
|
self._do_link_test('cleanup',
|
|
|
|
(('ts', 0),),
|
|
|
|
None,
|
|
|
|
None, # we already made file linked to older part
|
2021-06-30 14:05:23 +01:00
|
|
|
None, # different inode but same timestamp: removed
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
(('ts', 0),),
|
2021-06-30 14:05:23 +01:00
|
|
|
exp_ret_code=0)
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
self.assertIn('2 hash dirs processed (cleanup=True) '
|
2021-06-30 14:05:23 +01:00
|
|
|
'(2 files, 0 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
|
|
|
self.assertEqual([], warning_lines)
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
with open(new_filepath, 'r') as fd:
|
|
|
|
self.assertEqual(older_filepath, fd.read())
|
|
|
|
# current-2 is linked so can be removed in cleanup
|
|
|
|
self.assertFalse(os.path.exists(older_filepath))
|
|
|
|
|
|
|
|
def test_cleanup_conflicting_ts_is_linked_to_part_power_minus_2_ok(self):
|
|
|
|
# link from next partition to current-2 partition;
|
|
|
|
# different file in current partition
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self.rb.increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
# setup a file in the next part power (PART_POWER + 1) location that is
|
|
|
|
# linked to a file in an older (PART_POWER - 2) location
|
|
|
|
filename = '.'.join([self.obj_ts.internal, 'ts'])
|
|
|
|
older_filepath, new_filepath = self._make_link(filename,
|
|
|
|
PART_POWER - 2)
|
|
|
|
self._do_link_test('cleanup',
|
|
|
|
(('ts', 0),),
|
|
|
|
None,
|
|
|
|
None, # we already made file linked to older part
|
|
|
|
None,
|
|
|
|
(('ts', 0),),
|
2021-06-30 14:05:23 +01:00
|
|
|
exp_ret_code=0)
|
relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
- During relink, the Nth location is visited and its link is
verified. Then the N-1th location is visited and an EEXIST error
is encountered, but the new check verifies that a link exists to
the Nth location, which is OK.
- During cleanup the locations are visited in the same order, but
files are removed so that the Nth location file no longer exists
when the N-1th location is visited. If the N-1th location still
has a conflicting file then existence of an alternative link to
the Nth location can no longer be verified, so an error would be
raised. Therefore, the N-1th location file must be removed during
relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021-03-26 13:41:36 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
# both the PART_POWER and PART_POWER - N partitions are visited, no new
|
|
|
|
# links are created, and both the older files are removed
|
|
|
|
self.assertIn('2 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 0 linked, 2 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
|
|
|
self.assertEqual([], warning_lines)
|
|
|
|
with open(new_filepath, 'r') as fd:
|
|
|
|
self.assertEqual(older_filepath, fd.read())
|
|
|
|
self.assertFalse(os.path.exists(older_filepath))
|
|
|
|
|
2021-03-19 20:14:20 +00:00
|
|
|
def test_cleanup_conflicting_older_data_file(self):
|
|
|
|
# older conflicting file isn't relevant so cleanup succeeds
|
|
|
|
self._cleanup_test((('data', 0),),
|
|
|
|
(('data', 1),),
|
|
|
|
(('data', 0),), # different inode
|
|
|
|
None,
|
|
|
|
(('data', 1),), # cleanup_ondisk_files rm'd 0.data
|
|
|
|
exp_ret_code=0)
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(0 files, 0 linked, 1 removed, 0 errors)',
|
|
|
|
info_lines)
|
|
|
|
|
|
|
|
def test_cleanup_conflicting_data_file_conflicting_meta_file(self):
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1)),
|
|
|
|
None,
|
|
|
|
(('data', 0), ('meta', 1)), # different inodes
|
|
|
|
(('data', 0), ('meta', 1)),
|
|
|
|
(('data', 0), ('meta', 1)),
|
|
|
|
exp_ret_code=1)
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
2021-03-19 20:14:20 +00:00
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 0 linked, 0 removed, 2 errors)',
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines)
|
2021-03-19 20:14:20 +00:00
|
|
|
|
|
|
|
def test_cleanup_conflicting_data_file_existing_meta_file(self):
|
|
|
|
# if just one link fails to be created then *nothing* is removed from
|
|
|
|
# old dir
|
|
|
|
self._cleanup_test((('data', 0), ('meta', 1)),
|
|
|
|
(('meta', 1),),
|
|
|
|
(('data', 0),), # different inode
|
|
|
|
(('data', 0), ('meta', 1)),
|
|
|
|
(('data', 0), ('meta', 1)),
|
|
|
|
exp_ret_code=1)
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
2021-03-19 20:14:20 +00:00
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 0 linked, 0 removed, 1 errors)',
|
2021-01-06 16:18:04 -08:00
|
|
|
warning_lines)
|
2021-03-19 20:14:20 +00:00
|
|
|
|
2021-02-13 09:07:43 -06:00
|
|
|
def test_cleanup_first_quartile_does_rehash(self):
|
|
|
|
# we need object name in lower half of current part
|
|
|
|
self._setup_object(lambda part: part < 2 ** (PART_POWER - 1))
|
|
|
|
self.assertLess(self.next_part, 2 ** PART_POWER)
|
2016-07-04 18:21:54 +02:00
|
|
|
self._common_test_cleanup()
|
2021-02-13 09:07:43 -06:00
|
|
|
|
|
|
|
# don't mock re-hash for variety (and so we can assert side-effects)
|
2021-02-01 16:40:21 -08:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
2016-07-04 18:21:54 +02:00
|
|
|
|
|
|
|
# Old objectname should be removed, new should still exist
|
|
|
|
self.assertTrue(os.path.isdir(self.expected_dir))
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file))
|
|
|
|
self.assertFalse(os.path.isfile(
|
|
|
|
os.path.join(self.objdir, self.object_fname)))
|
2021-02-13 09:07:43 -06:00
|
|
|
self.assertFalse(os.path.exists(self.part_dir))
|
|
|
|
|
|
|
|
with open(os.path.join(self.next_part_dir, 'hashes.invalid')) as fp:
|
|
|
|
self.assertEqual(fp.read(), '')
|
|
|
|
with open(os.path.join(self.next_part_dir, 'hashes.pkl'), 'rb') as fp:
|
|
|
|
hashes = pickle.load(fp)
|
|
|
|
self.assertIn(self._hash[-3:], hashes)
|
|
|
|
|
|
|
|
# create an object in a first quartile partition and pretend it should
|
|
|
|
# be there; check that cleanup does not fail and does not remove the
|
|
|
|
# partition!
|
|
|
|
self._setup_object(lambda part: part < 2 ** (PART_POWER - 1))
|
|
|
|
with mock.patch('swift.cli.relinker.replace_partition_in_path',
|
2021-03-11 09:52:52 -06:00
|
|
|
lambda *args, **kwargs: args[1]):
|
2021-02-13 09:07:43 -06:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
self.assertTrue(os.path.exists(self.objname))
|
|
|
|
|
|
|
|
def test_cleanup_second_quartile_no_rehash(self):
|
|
|
|
# we need a part in upper half of current part power
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
2021-03-08 17:16:37 +00:00
|
|
|
self.assertGreaterEqual(self.part, 2 ** (PART_POWER - 1))
|
2021-02-13 09:07:43 -06:00
|
|
|
self._common_test_cleanup()
|
|
|
|
|
|
|
|
def fake_hash_suffix(suffix_dir, policy):
|
2021-03-08 17:16:37 +00:00
|
|
|
# check that the hash dir is empty and remove it just like the
|
2021-02-13 09:07:43 -06:00
|
|
|
# real _hash_suffix
|
|
|
|
self.assertEqual([self._hash], os.listdir(suffix_dir))
|
|
|
|
hash_dir = os.path.join(suffix_dir, self._hash)
|
|
|
|
self.assertEqual([], os.listdir(hash_dir))
|
|
|
|
os.rmdir(hash_dir)
|
|
|
|
os.rmdir(suffix_dir)
|
|
|
|
raise PathNotDir()
|
|
|
|
|
|
|
|
with mock.patch('swift.obj.diskfile.DiskFileManager._hash_suffix',
|
|
|
|
side_effect=fake_hash_suffix) as mock_hash_suffix:
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
|
|
|
|
# the old suffix dir is rehashed before the old partition is removed,
|
|
|
|
# but the new suffix dir is not rehashed
|
|
|
|
self.assertEqual([mock.call(self.suffix_dir, policy=self.policy)],
|
|
|
|
mock_hash_suffix.call_args_list)
|
|
|
|
|
|
|
|
# Old objectname should be removed, new should still exist
|
|
|
|
self.assertTrue(os.path.isdir(self.expected_dir))
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file))
|
|
|
|
self.assertFalse(os.path.isfile(
|
|
|
|
os.path.join(self.objdir, self.object_fname)))
|
|
|
|
self.assertFalse(os.path.exists(self.part_dir))
|
|
|
|
|
|
|
|
with open(os.path.join(self.objects, str(self.next_part),
|
|
|
|
'hashes.invalid')) as fp:
|
|
|
|
self.assertEqual(fp.read(), '')
|
|
|
|
with open(os.path.join(self.objects, str(self.next_part),
|
|
|
|
'hashes.pkl'), 'rb') as fp:
|
|
|
|
hashes = pickle.load(fp)
|
|
|
|
self.assertIn(self._hash[-3:], hashes)
|
2016-07-04 18:21:54 +02:00
|
|
|
|
2021-01-29 12:43:54 -08:00
|
|
|
def test_cleanup_no_applicable_policy(self):
|
|
|
|
# NB do not prepare part power increase
|
|
|
|
self._save_ring()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-01-29 12:43:54 -08:00
|
|
|
self.assertEqual(2, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
]))
|
|
|
|
self.assertEqual(self.logger.get_lines_for_level('warning'),
|
|
|
|
['No policy found to increase the partition power.'])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-01-29 12:43:54 -08:00
|
|
|
|
|
|
|
def test_cleanup_not_mounted(self):
|
|
|
|
self._common_test_cleanup()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-01-29 12:43:54 -08:00
|
|
|
self.assertEqual(1, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
]))
|
|
|
|
self.assertEqual(self.logger.get_lines_for_level('warning'), [
|
|
|
|
'Skipping sda1 as it is not mounted',
|
2021-01-06 16:18:04 -08:00
|
|
|
'1 disks were unmounted',
|
|
|
|
'0 hash dirs processed (cleanup=True) '
|
|
|
|
'(0 files, 0 linked, 0 removed, 0 errors)',
|
|
|
|
])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-01-29 12:43:54 -08:00
|
|
|
|
|
|
|
def test_cleanup_listdir_error(self):
|
|
|
|
self._common_test_cleanup()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-01-29 12:43:54 -08:00
|
|
|
with self._mock_listdir():
|
|
|
|
self.assertEqual(1, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount-check'
|
|
|
|
]))
|
|
|
|
self.assertEqual(self.logger.get_lines_for_level('warning'), [
|
|
|
|
'Skipping %s because ' % self.objects,
|
2021-01-06 16:18:04 -08:00
|
|
|
'There were 1 errors listing partition directories',
|
|
|
|
'0 hash dirs processed (cleanup=True) '
|
|
|
|
'(0 files, 0 linked, 0 removed, 1 errors)',
|
|
|
|
])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-01-29 12:43:54 -08:00
|
|
|
|
2019-11-14 16:26:48 -05:00
|
|
|
def test_cleanup_device_filter(self):
|
|
|
|
self._common_test_cleanup()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
'--device', self.existing_device,
|
|
|
|
]))
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# Old objectname should be removed, new should still exist
|
|
|
|
self.assertTrue(os.path.isdir(self.expected_dir))
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file))
|
|
|
|
self.assertFalse(os.path.isfile(
|
|
|
|
os.path.join(self.objdir, self.object_fname)))
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
def test_cleanup_device_filter_invalid(self):
|
|
|
|
self._common_test_cleanup()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
'--device', 'none',
|
|
|
|
]))
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# Old objectname should still exist, new should still exist
|
|
|
|
self.assertTrue(os.path.isdir(self.expected_dir))
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file))
|
|
|
|
self.assertTrue(os.path.isfile(
|
|
|
|
os.path.join(self.objdir, self.object_fname)))
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2019-11-14 16:26:48 -05:00
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
def _time_iter(self, start):
|
|
|
|
yield start
|
|
|
|
while True:
|
|
|
|
yield start + 1
|
|
|
|
|
|
|
|
@patch_policies(
|
|
|
|
[StoragePolicy(0, 'platinum', True),
|
|
|
|
ECStoragePolicy(
|
|
|
|
1, name='ec', is_default=False, ec_type=DEFAULT_TEST_EC_TYPE,
|
|
|
|
ec_ndata=4, ec_nparity=2)])
|
|
|
|
@mock.patch('os.getpid', return_value=100)
|
|
|
|
def test_relink_cleanup(self, mock_getpid):
|
|
|
|
# setup a policy-0 object in a part in the second quartile so that its
|
|
|
|
# next part *will not* be handled during cleanup
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
# create policy-1 object in a part in the first quartile so that its
|
|
|
|
# next part *will* be handled during cleanup
|
|
|
|
_hash, pol_1_part, pol_1_next_part, objpath = self._get_object_name(
|
|
|
|
lambda part: part < 2 ** (PART_POWER - 1))
|
|
|
|
self._create_object(POLICIES[1], pol_1_part, _hash)
|
|
|
|
|
|
|
|
state_files = {
|
|
|
|
POLICIES[0]: os.path.join(self.devices, self.existing_device,
|
|
|
|
'relink.objects.json'),
|
|
|
|
POLICIES[1]: os.path.join(self.devices, self.existing_device,
|
|
|
|
'relink.objects-1.json'),
|
|
|
|
}
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
ts1 = time.time()
|
|
|
|
with mock.patch('time.time', side_effect=self._time_iter(ts1)):
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
self.conf_file,
|
|
|
|
]))
|
|
|
|
|
|
|
|
orig_inodes = {}
|
|
|
|
for policy, part in zip(POLICIES,
|
|
|
|
(self.part, pol_1_part)):
|
|
|
|
state_file = state_files[policy]
|
|
|
|
orig_inodes[policy] = os.stat(state_file).st_ino
|
|
|
|
state = {str(part): True}
|
|
|
|
with open(state_files[policy], 'rt') as f:
|
|
|
|
self.assertEqual(json.load(f), {
|
|
|
|
"part_power": PART_POWER,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": state})
|
|
|
|
recon_progress = utils.load_recon_cache(self.recon_cache)
|
|
|
|
expected_recon_data = {
|
|
|
|
'devices': {'sda1': {'parts_done': 2,
|
|
|
|
'policies': {'0': {
|
|
|
|
'next_part_power': PART_POWER + 1,
|
|
|
|
'part_power': PART_POWER,
|
|
|
|
'parts_done': 1,
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {'errors': 0,
|
|
|
|
'files': 1,
|
|
|
|
'hash_dirs': 1,
|
|
|
|
'linked': 1,
|
|
|
|
'removed': 0},
|
|
|
|
'step': 'relink',
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 1,
|
|
|
|
'total_time': 0.0},
|
|
|
|
'1': {
|
|
|
|
'next_part_power': PART_POWER + 1,
|
|
|
|
'part_power': PART_POWER,
|
|
|
|
'parts_done': 1,
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 0,
|
|
|
|
'files': 1,
|
|
|
|
'hash_dirs': 1,
|
|
|
|
'linked': 1,
|
|
|
|
'removed': 0},
|
|
|
|
'step': 'relink',
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 1,
|
|
|
|
'total_time': 0.0}},
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {'errors': 0,
|
|
|
|
'files': 2,
|
|
|
|
'hash_dirs': 2,
|
|
|
|
'linked': 2,
|
|
|
|
'removed': 0},
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 2,
|
|
|
|
'total_time': 0}},
|
|
|
|
'workers': {'100': {'devices': ['sda1'],
|
|
|
|
'return_code': 0,
|
|
|
|
'timestamp': mock.ANY}}}
|
|
|
|
self.assertEqual(recon_progress, expected_recon_data)
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
self.rb.increase_partition_power()
|
|
|
|
self.rb._ring = None # Force builder to reload ring
|
|
|
|
self._save_ring()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with open(state_files[0], 'rt'), open(state_files[1], 'rt'):
|
|
|
|
# Keep the state files open during cleanup so the inode can't be
|
2021-01-07 16:18:07 -08:00
|
|
|
# released/re-used when it gets unlinked
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual(orig_inodes[0], os.stat(state_files[0]).st_ino)
|
|
|
|
self.assertEqual(orig_inodes[1], os.stat(state_files[1]).st_ino)
|
|
|
|
ts1 = time.time()
|
|
|
|
with mock.patch('time.time', side_effect=self._time_iter(ts1)):
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
self.conf_file,
|
|
|
|
]))
|
|
|
|
self.assertNotEqual(orig_inodes[0], os.stat(state_files[0]).st_ino)
|
|
|
|
self.assertNotEqual(orig_inodes[1], os.stat(state_files[1]).st_ino)
|
|
|
|
for policy, part, next_part in zip(POLICIES,
|
|
|
|
(self.part, pol_1_part),
|
|
|
|
(None, pol_1_next_part)):
|
|
|
|
state_file = state_files[policy]
|
|
|
|
state = {str(part): True}
|
|
|
|
if next_part is not None:
|
|
|
|
# cleanup will process the new partition as well as the old if
|
|
|
|
# old is in first quartile
|
|
|
|
state[str(next_part)] = True
|
|
|
|
with open(state_file, 'rt') as f:
|
|
|
|
# NB: part_power/next_part_power tuple changed, so state was
|
|
|
|
# reset (though we track prev_part_power for an efficient clean
|
|
|
|
# up)
|
|
|
|
self.assertEqual(json.load(f), {
|
|
|
|
"prev_part_power": PART_POWER,
|
|
|
|
"part_power": PART_POWER + 1,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": state})
|
|
|
|
recon_progress = utils.load_recon_cache(self.recon_cache)
|
|
|
|
expected_recon_data = {
|
|
|
|
'devices': {'sda1': {'parts_done': 3,
|
|
|
|
'policies': {'0': {
|
|
|
|
'next_part_power': PART_POWER + 1,
|
|
|
|
'part_power': PART_POWER + 1,
|
|
|
|
'parts_done': 1,
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {'errors': 0,
|
|
|
|
'files': 1,
|
|
|
|
'hash_dirs': 1,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 1},
|
|
|
|
'step': 'cleanup',
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 1,
|
|
|
|
'total_time': 0.0},
|
|
|
|
'1': {
|
|
|
|
'next_part_power': PART_POWER + 1,
|
|
|
|
'part_power': PART_POWER + 1,
|
|
|
|
'parts_done': 2,
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 0,
|
|
|
|
'files': 1,
|
|
|
|
'hash_dirs': 1,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 1},
|
|
|
|
'step': 'cleanup',
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 2,
|
|
|
|
'total_time': 0.0}},
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {'errors': 0,
|
|
|
|
'files': 2,
|
|
|
|
'hash_dirs': 2,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 2},
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 3,
|
|
|
|
'total_time': 0}},
|
|
|
|
'workers': {'100': {'devices': ['sda1'],
|
|
|
|
'return_code': 0,
|
|
|
|
'timestamp': mock.ANY}}}
|
|
|
|
self.assertEqual(recon_progress, expected_recon_data)
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
def test_devices_filter_filtering(self):
|
|
|
|
# With no filtering, returns all devices
|
2021-03-18 15:20:43 -07:00
|
|
|
r = relinker.Relinker(
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
{'devices': self.devices,
|
|
|
|
'recon_cache_path': self.recon_cache_path},
|
|
|
|
self.logger, self.existing_device)
|
2021-03-18 15:20:43 -07:00
|
|
|
devices = r.devices_filter("", [self.existing_device])
|
2019-11-14 16:26:48 -05:00
|
|
|
self.assertEqual(set([self.existing_device]), devices)
|
|
|
|
|
|
|
|
# With a matching filter, returns what is matching
|
2021-03-18 15:20:43 -07:00
|
|
|
devices = r.devices_filter("", [self.existing_device, 'sda2'])
|
2019-11-14 16:26:48 -05:00
|
|
|
self.assertEqual(set([self.existing_device]), devices)
|
|
|
|
|
|
|
|
# With a non matching filter, returns nothing
|
2021-01-06 16:18:04 -08:00
|
|
|
r.device_list = ['none']
|
2021-03-18 15:20:43 -07:00
|
|
|
devices = r.devices_filter("", [self.existing_device])
|
2019-11-14 16:26:48 -05:00
|
|
|
self.assertEqual(set(), devices)
|
|
|
|
|
|
|
|
def test_hook_pre_post_device_locking(self):
|
2021-03-18 15:20:43 -07:00
|
|
|
r = relinker.Relinker(
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
{'devices': self.devices,
|
|
|
|
'recon_cache_path': self.recon_cache_path},
|
|
|
|
self.logger, self.existing_device)
|
2019-11-14 16:26:48 -05:00
|
|
|
device_path = os.path.join(self.devices, self.existing_device)
|
2021-03-18 15:20:43 -07:00
|
|
|
r.datadir = 'object' # would get set in process_policy
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
r.states = {"state": {}, "part_power": PART_POWER,
|
|
|
|
"next_part_power": PART_POWER + 1} # ditto
|
2021-03-18 15:20:43 -07:00
|
|
|
lock_file = os.path.join(device_path, '.relink.%s.lock' % r.datadir)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
r.policy = self.policy
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# The first run gets the lock
|
2021-03-18 15:20:43 -07:00
|
|
|
r.hook_pre_device(device_path)
|
|
|
|
self.assertIsNotNone(r.dev_lock)
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# A following run would block
|
|
|
|
with self.assertRaises(IOError) as raised:
|
|
|
|
with open(lock_file, 'a') as f:
|
|
|
|
fcntl.flock(f.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
|
|
|
|
self.assertEqual(errno.EAGAIN, raised.exception.errno)
|
|
|
|
|
|
|
|
# Another must not get the lock, so it must return an empty list
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
r.hook_post_device(device_path)
|
2021-03-18 15:20:43 -07:00
|
|
|
self.assertIsNone(r.dev_lock)
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
with open(lock_file, 'a') as f:
|
|
|
|
fcntl.flock(f.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
|
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
def _test_state_file(self, pol, expected_recon_data):
|
2021-03-18 15:20:43 -07:00
|
|
|
r = relinker.Relinker(
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
{'devices': self.devices,
|
|
|
|
'recon_cache_path': self.recon_cache_path,
|
|
|
|
'stats_interval': 0.0},
|
|
|
|
self.logger, [self.existing_device])
|
2019-11-14 16:26:48 -05:00
|
|
|
device_path = os.path.join(self.devices, self.existing_device)
|
2021-03-18 15:20:43 -07:00
|
|
|
r.datadir = 'objects'
|
|
|
|
r.part_power = PART_POWER
|
|
|
|
r.next_part_power = PART_POWER + 1
|
|
|
|
datadir_path = os.path.join(device_path, r.datadir)
|
|
|
|
state_file = os.path.join(device_path, 'relink.%s.json' % r.datadir)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
r.policy = pol
|
|
|
|
r.pid = 1234 # for recon workers stats
|
|
|
|
|
|
|
|
recon_progress = utils.load_recon_cache(self.recon_cache)
|
|
|
|
# the progress for the current policy should be gone. So we should
|
|
|
|
# just have anything from any other process polices.. if any.
|
|
|
|
self.assertEqual(recon_progress, expected_recon_data)
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# Start relinking
|
2021-03-18 15:20:43 -07:00
|
|
|
r.states = {
|
|
|
|
"part_power": PART_POWER,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": {},
|
|
|
|
}
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# Load the states: As it starts, it must be empty
|
2021-03-18 15:20:43 -07:00
|
|
|
r.hook_pre_device(device_path)
|
|
|
|
self.assertEqual({}, r.states["state"])
|
|
|
|
os.close(r.dev_lock) # Release the lock
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# Partition 312 is ignored because it must have been created with the
|
|
|
|
# next_part_power, so it does not need to be relinked
|
|
|
|
# 96 and 227 are reverse ordered
|
|
|
|
# auditor_status_ALL.json is ignored because it's not a partition
|
2021-03-18 15:20:43 -07:00
|
|
|
self.assertEqual(['227', '96'], r.partitions_filter(
|
|
|
|
"", ['96', '227', '312', 'auditor_status.json']))
|
|
|
|
self.assertEqual(r.states["state"], {'96': False, '227': False})
|
2019-11-14 16:26:48 -05:00
|
|
|
|
2021-03-18 15:20:43 -07:00
|
|
|
r.diskfile_mgr = DiskFileRouter({
|
|
|
|
'devices': self.devices,
|
|
|
|
'mount_check': False,
|
|
|
|
}, self.logger)[r.policy]
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
|
2019-11-14 16:26:48 -05:00
|
|
|
# Ack partition 96
|
2021-04-26 14:48:40 -07:00
|
|
|
r.hook_pre_partition(os.path.join(datadir_path, '96'))
|
2021-03-18 15:20:43 -07:00
|
|
|
r.hook_post_partition(os.path.join(datadir_path, '96'))
|
|
|
|
self.assertEqual(r.states["state"], {'96': True, '227': False})
|
2021-04-26 14:48:40 -07:00
|
|
|
self.assertEqual(self.logger.get_lines_for_level("info"), [
|
|
|
|
"Step: relink Device: sda1 Policy: %s "
|
|
|
|
"Partitions: 1/2" % r.policy.name,
|
|
|
|
])
|
2019-11-14 16:26:48 -05:00
|
|
|
with open(state_file, 'rt') as f:
|
2021-01-07 16:18:07 -08:00
|
|
|
self.assertEqual(json.load(f), {
|
|
|
|
"part_power": PART_POWER,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": {'96': True, '227': False}})
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
recon_progress = utils.load_recon_cache(self.recon_cache)
|
|
|
|
expected_recon_data.update(
|
|
|
|
{'devices': {
|
|
|
|
'sda1': {
|
|
|
|
'parts_done': 1,
|
|
|
|
'policies': {
|
|
|
|
str(pol.idx): {
|
|
|
|
'next_part_power': PART_POWER + 1,
|
|
|
|
'part_power': PART_POWER,
|
|
|
|
'parts_done': 1,
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 0,
|
|
|
|
'files': 0,
|
|
|
|
'hash_dirs': 0,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 0},
|
|
|
|
'step': 'relink',
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 2}},
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 0,
|
|
|
|
'files': 0,
|
|
|
|
'hash_dirs': 0,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 0},
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 2,
|
|
|
|
'total_time': 0}},
|
|
|
|
'workers': {
|
|
|
|
'1234': {'timestamp': mock.ANY,
|
|
|
|
'return_code': None,
|
|
|
|
'devices': ['sda1']}}})
|
|
|
|
self.assertEqual(recon_progress, expected_recon_data)
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# Restart relinking after only part 96 was done
|
2021-04-26 14:48:40 -07:00
|
|
|
self.logger.clear()
|
|
|
|
self.assertEqual(['227'],
|
|
|
|
r.partitions_filter("", ['96', '227', '312']))
|
|
|
|
self.assertEqual(r.states["state"], {'96': True, '227': False})
|
|
|
|
|
|
|
|
# ...but there's an error
|
|
|
|
r.hook_pre_partition(os.path.join(datadir_path, '227'))
|
|
|
|
r.stats['errors'] += 1
|
|
|
|
r.hook_post_partition(os.path.join(datadir_path, '227'))
|
|
|
|
self.assertEqual(self.logger.get_lines_for_level("info"), [
|
|
|
|
"Step: relink Device: sda1 Policy: %s "
|
|
|
|
"Partitions: 1/2" % r.policy.name,
|
|
|
|
])
|
|
|
|
self.assertEqual(r.states["state"], {'96': True, '227': False})
|
|
|
|
with open(state_file, 'rt') as f:
|
|
|
|
self.assertEqual(json.load(f), {
|
|
|
|
"part_power": PART_POWER,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": {'96': True, '227': False}})
|
|
|
|
|
|
|
|
# OK, one more try
|
|
|
|
self.logger.clear()
|
2019-11-14 16:26:48 -05:00
|
|
|
self.assertEqual(['227'],
|
2021-03-18 15:20:43 -07:00
|
|
|
r.partitions_filter("", ['96', '227', '312']))
|
|
|
|
self.assertEqual(r.states["state"], {'96': True, '227': False})
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# Ack partition 227
|
2021-04-26 14:48:40 -07:00
|
|
|
r.hook_pre_partition(os.path.join(datadir_path, '227'))
|
2021-03-18 15:20:43 -07:00
|
|
|
r.hook_post_partition(os.path.join(datadir_path, '227'))
|
2021-04-26 14:48:40 -07:00
|
|
|
self.assertEqual(self.logger.get_lines_for_level("info"), [
|
|
|
|
"Step: relink Device: sda1 Policy: %s "
|
|
|
|
"Partitions: 2/2" % r.policy.name,
|
|
|
|
])
|
2021-03-18 15:20:43 -07:00
|
|
|
self.assertEqual(r.states["state"], {'96': True, '227': True})
|
2019-11-14 16:26:48 -05:00
|
|
|
with open(state_file, 'rt') as f:
|
2021-01-07 16:18:07 -08:00
|
|
|
self.assertEqual(json.load(f), {
|
|
|
|
"part_power": PART_POWER,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": {'96': True, '227': True}})
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
recon_progress = utils.load_recon_cache(self.recon_cache)
|
|
|
|
expected_recon_data.update(
|
|
|
|
{'devices': {
|
|
|
|
'sda1': {
|
|
|
|
'parts_done': 2,
|
|
|
|
'policies': {
|
|
|
|
str(pol.idx): {
|
|
|
|
'next_part_power': PART_POWER + 1,
|
|
|
|
'part_power': PART_POWER,
|
|
|
|
'parts_done': 2,
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 1,
|
|
|
|
'files': 0,
|
|
|
|
'hash_dirs': 0,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 0},
|
|
|
|
'step': 'relink',
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 2}},
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 1,
|
|
|
|
'files': 0,
|
|
|
|
'hash_dirs': 0,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 0},
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 2,
|
|
|
|
'total_time': 0}}})
|
|
|
|
self.assertEqual(recon_progress, expected_recon_data)
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# If the process restarts, it reload the state
|
2021-03-18 15:20:43 -07:00
|
|
|
r.states = {
|
2021-01-07 16:18:07 -08:00
|
|
|
"part_power": PART_POWER,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": {},
|
|
|
|
}
|
2021-03-18 15:20:43 -07:00
|
|
|
r.hook_pre_device(device_path)
|
|
|
|
self.assertEqual(r.states, {
|
2021-01-07 16:18:07 -08:00
|
|
|
"part_power": PART_POWER,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": {'96': True, '227': True}})
|
2021-03-18 15:20:43 -07:00
|
|
|
os.close(r.dev_lock) # Release the lock
|
2021-01-07 16:18:07 -08:00
|
|
|
|
|
|
|
# Start cleanup -- note that part_power and next_part_power now match!
|
2021-03-18 15:20:43 -07:00
|
|
|
r.do_cleanup = True
|
|
|
|
r.part_power = PART_POWER + 1
|
|
|
|
r.states = {
|
2021-01-07 16:18:07 -08:00
|
|
|
"part_power": PART_POWER + 1,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": {},
|
|
|
|
}
|
|
|
|
# ...which means our state file was ignored
|
2021-03-18 15:20:43 -07:00
|
|
|
r.hook_pre_device(device_path)
|
|
|
|
self.assertEqual(r.states, {
|
2021-01-07 16:18:07 -08:00
|
|
|
"prev_part_power": PART_POWER,
|
|
|
|
"part_power": PART_POWER + 1,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": {}})
|
2021-03-18 15:20:43 -07:00
|
|
|
os.close(r.dev_lock) # Release the lock
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
self.assertEqual(['227', '96'],
|
2021-03-18 15:20:43 -07:00
|
|
|
r.partitions_filter("", ['96', '227', '312']))
|
2019-11-14 16:26:48 -05:00
|
|
|
# Ack partition 227
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
r.hook_pre_partition(os.path.join(datadir_path, '227'))
|
2021-03-18 15:20:43 -07:00
|
|
|
r.hook_post_partition(os.path.join(datadir_path, '227'))
|
2021-03-15 09:58:27 +11:00
|
|
|
self.assertIn("Step: cleanup Device: sda1 Policy: %s "
|
|
|
|
"Partitions: 1/2" % r.policy.name,
|
2021-03-09 19:45:15 +11:00
|
|
|
self.logger.get_lines_for_level("info"))
|
2021-03-18 15:20:43 -07:00
|
|
|
self.assertEqual(r.states["state"],
|
2021-01-07 16:18:07 -08:00
|
|
|
{'96': False, '227': True})
|
2019-11-14 16:26:48 -05:00
|
|
|
with open(state_file, 'rt') as f:
|
2021-01-07 16:18:07 -08:00
|
|
|
self.assertEqual(json.load(f), {
|
|
|
|
"prev_part_power": PART_POWER,
|
|
|
|
"part_power": PART_POWER + 1,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": {'96': False, '227': True}})
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
recon_progress = utils.load_recon_cache(self.recon_cache)
|
|
|
|
expected_recon_data.update(
|
|
|
|
{'devices': {
|
|
|
|
'sda1': {
|
|
|
|
'parts_done': 1,
|
|
|
|
'policies': {
|
|
|
|
str(pol.idx): {
|
|
|
|
'next_part_power': PART_POWER + 1,
|
|
|
|
'part_power': PART_POWER + 1,
|
|
|
|
'parts_done': 1,
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 0,
|
|
|
|
'files': 0,
|
|
|
|
'hash_dirs': 0,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 0},
|
|
|
|
'step': 'cleanup',
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 2}},
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 0,
|
|
|
|
'files': 0,
|
|
|
|
'hash_dirs': 0,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 0},
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 2,
|
|
|
|
'total_time': 0}}})
|
|
|
|
self.assertEqual(recon_progress, expected_recon_data)
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# Restart cleanup after only part 227 was done
|
2021-03-18 15:20:43 -07:00
|
|
|
self.assertEqual(['96'], r.partitions_filter("", ['96', '227', '312']))
|
|
|
|
self.assertEqual(r.states["state"],
|
2021-01-07 16:18:07 -08:00
|
|
|
{'96': False, '227': True})
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# Ack partition 96
|
2021-03-18 15:20:43 -07:00
|
|
|
r.hook_post_partition(os.path.join(datadir_path, '96'))
|
2021-03-15 09:58:27 +11:00
|
|
|
self.assertIn("Step: cleanup Device: sda1 Policy: %s "
|
|
|
|
"Partitions: 2/2" % r.policy.name,
|
2021-03-09 19:45:15 +11:00
|
|
|
self.logger.get_lines_for_level("info"))
|
2021-03-18 15:20:43 -07:00
|
|
|
self.assertEqual(r.states["state"],
|
2021-01-07 16:18:07 -08:00
|
|
|
{'96': True, '227': True})
|
2019-11-14 16:26:48 -05:00
|
|
|
with open(state_file, 'rt') as f:
|
2021-01-07 16:18:07 -08:00
|
|
|
self.assertEqual(json.load(f), {
|
|
|
|
"prev_part_power": PART_POWER,
|
|
|
|
"part_power": PART_POWER + 1,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": {'96': True, '227': True}})
|
2019-11-14 16:26:48 -05:00
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
recon_progress = utils.load_recon_cache(self.recon_cache)
|
|
|
|
expected_recon_data.update(
|
|
|
|
{'devices': {
|
|
|
|
'sda1': {
|
|
|
|
'parts_done': 2,
|
|
|
|
'policies': {
|
|
|
|
str(pol.idx): {
|
|
|
|
'next_part_power': PART_POWER + 1,
|
|
|
|
'part_power': PART_POWER + 1,
|
|
|
|
'parts_done': 2,
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 0,
|
|
|
|
'files': 0,
|
|
|
|
'hash_dirs': 0,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 0},
|
|
|
|
'step': 'cleanup',
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 2}},
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 0,
|
|
|
|
'files': 0,
|
|
|
|
'hash_dirs': 0,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 0},
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 2,
|
|
|
|
'total_time': 0}}})
|
|
|
|
self.assertEqual(recon_progress, expected_recon_data)
|
|
|
|
|
2019-11-14 16:26:48 -05:00
|
|
|
# At the end, the state is still accurate
|
2021-03-18 15:20:43 -07:00
|
|
|
r.states = {
|
2021-01-07 16:18:07 -08:00
|
|
|
"prev_part_power": PART_POWER,
|
|
|
|
"part_power": PART_POWER + 1,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": {},
|
|
|
|
}
|
2021-03-18 15:20:43 -07:00
|
|
|
r.hook_pre_device(device_path)
|
|
|
|
self.assertEqual(r.states["state"],
|
2021-01-07 16:18:07 -08:00
|
|
|
{'96': True, '227': True})
|
2021-03-18 15:20:43 -07:00
|
|
|
os.close(r.dev_lock) # Release the lock
|
2021-01-07 16:18:07 -08:00
|
|
|
|
|
|
|
# If the part_power/next_part_power tuple differs, restart from scratch
|
2021-03-18 15:20:43 -07:00
|
|
|
r.states = {
|
2021-01-07 16:18:07 -08:00
|
|
|
"part_power": PART_POWER + 1,
|
|
|
|
"next_part_power": PART_POWER + 2,
|
|
|
|
"state": {},
|
|
|
|
}
|
2021-03-18 15:20:43 -07:00
|
|
|
r.hook_pre_device(device_path)
|
|
|
|
self.assertEqual(r.states["state"], {})
|
2021-01-07 16:18:07 -08:00
|
|
|
self.assertFalse(os.path.exists(state_file))
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
# this will also reset the recon stats
|
|
|
|
recon_progress = utils.load_recon_cache(self.recon_cache)
|
|
|
|
expected_recon_data.update({
|
|
|
|
'devices': {
|
|
|
|
'sda1': {
|
|
|
|
'parts_done': 0,
|
|
|
|
'policies': {
|
|
|
|
str(pol.idx): {
|
|
|
|
'next_part_power': PART_POWER + 2,
|
|
|
|
'part_power': PART_POWER + 1,
|
|
|
|
'parts_done': 0,
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 0,
|
|
|
|
'files': 0,
|
|
|
|
'hash_dirs': 0,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 0},
|
|
|
|
'step': 'cleanup',
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 0}},
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 0,
|
|
|
|
'files': 0,
|
|
|
|
'hash_dirs': 0,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 0},
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 0,
|
|
|
|
'total_time': 0}}})
|
|
|
|
self.assertEqual(recon_progress, expected_recon_data)
|
2021-03-18 15:20:43 -07:00
|
|
|
os.close(r.dev_lock) # Release the lock
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
# If the file gets corrupted, restart from scratch
|
|
|
|
with open(state_file, 'wt') as f:
|
|
|
|
f.write('NOT JSON')
|
2021-03-18 15:20:43 -07:00
|
|
|
r.states = {
|
|
|
|
"part_power": PART_POWER,
|
|
|
|
"next_part_power": PART_POWER + 1,
|
|
|
|
"state": {},
|
|
|
|
}
|
|
|
|
r.hook_pre_device(device_path)
|
|
|
|
self.assertEqual(r.states["state"], {})
|
2021-01-07 16:18:07 -08:00
|
|
|
self.assertFalse(os.path.exists(state_file))
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
recon_progress = utils.load_recon_cache(self.recon_cache)
|
|
|
|
expected_recon_data.update({
|
|
|
|
'devices': {
|
|
|
|
'sda1': {
|
|
|
|
'parts_done': 0,
|
|
|
|
'policies': {
|
|
|
|
str(pol.idx): {
|
|
|
|
'next_part_power': PART_POWER + 1,
|
|
|
|
'part_power': PART_POWER,
|
|
|
|
'parts_done': 0,
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 0,
|
|
|
|
'files': 0,
|
|
|
|
'hash_dirs': 0,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 0},
|
|
|
|
'step': 'cleanup',
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 0}},
|
|
|
|
'start_time': mock.ANY,
|
|
|
|
'stats': {
|
|
|
|
'errors': 0,
|
|
|
|
'files': 0,
|
|
|
|
'hash_dirs': 0,
|
|
|
|
'linked': 0,
|
|
|
|
'removed': 0},
|
|
|
|
'timestamp': mock.ANY,
|
|
|
|
'total_parts': 0,
|
|
|
|
'total_time': 0}}})
|
|
|
|
self.assertEqual(recon_progress, expected_recon_data)
|
2021-03-18 15:20:43 -07:00
|
|
|
os.close(r.dev_lock) # Release the lock
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
return expected_recon_data
|
|
|
|
|
|
|
|
@patch_policies(
|
|
|
|
[StoragePolicy(0, 'platinum', True),
|
|
|
|
ECStoragePolicy(
|
|
|
|
1, name='ec', is_default=False, ec_type=DEFAULT_TEST_EC_TYPE,
|
|
|
|
ec_ndata=4, ec_nparity=2)])
|
|
|
|
def test_state_file(self):
|
|
|
|
expected_recon_data = {}
|
|
|
|
for policy in POLICIES:
|
|
|
|
# because we specifying a device, it should be itself reset
|
|
|
|
expected_recon_data = self._test_state_file(
|
|
|
|
policy, expected_recon_data)
|
|
|
|
self.logger.clear()
|
2019-11-14 16:26:48 -05:00
|
|
|
|
2021-03-11 21:03:01 +00:00
|
|
|
def test_cleanup_relinked_ok(self):
|
|
|
|
self._common_test_cleanup()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-11 21:03:01 +00:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file)) # link intact
|
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('warning'))
|
|
|
|
# old partition should be cleaned up
|
|
|
|
self.assertFalse(os.path.exists(self.part_dir))
|
2021-03-09 17:44:26 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 0 linked, 1 removed, 0 errors)', info_lines)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-11 21:03:01 +00:00
|
|
|
|
2016-07-04 18:21:54 +02:00
|
|
|
def test_cleanup_not_yet_relinked(self):
|
2021-03-29 11:19:18 -07:00
|
|
|
# force new partition to be above range of partitions visited during
|
|
|
|
# cleanup
|
2021-03-08 17:16:37 +00:00
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
2016-07-04 18:21:54 +02:00
|
|
|
self._common_test_cleanup(relink=False)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-08 17:16:37 +00:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
2016-07-04 18:21:54 +02:00
|
|
|
|
2021-03-08 17:16:37 +00:00
|
|
|
self.assertTrue(os.path.isfile(self.expected_file)) # link created
|
2021-03-11 21:03:01 +00:00
|
|
|
# old partition should be cleaned up
|
|
|
|
self.assertFalse(os.path.exists(self.part_dir))
|
2021-03-08 17:16:37 +00:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('warning'))
|
|
|
|
self.assertIn(
|
|
|
|
'Relinking (cleanup) created link: %s to %s'
|
|
|
|
% (self.objname, self.expected_file),
|
|
|
|
self.logger.get_lines_for_level('debug'))
|
2021-03-09 17:44:26 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 1 linked, 1 removed, 0 errors)', info_lines)
|
2021-03-29 11:19:18 -07:00
|
|
|
# suffix should be invalidated and rehashed in new partition
|
|
|
|
hashes_invalid = os.path.join(self.next_part_dir, 'hashes.invalid')
|
|
|
|
self.assertTrue(os.path.exists(hashes_invalid))
|
|
|
|
with open(hashes_invalid, 'r') as fd:
|
|
|
|
self.assertEqual('', fd.read().strip())
|
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
|
|
|
|
|
|
|
def test_cleanup_not_yet_relinked_low(self):
|
|
|
|
# force new partition to be in the range of partitions visited during
|
|
|
|
# cleanup, but not exist until after cleanup would have visited it
|
|
|
|
self._setup_object(lambda part: part < 2 ** (PART_POWER - 1))
|
|
|
|
self._common_test_cleanup(relink=False)
|
|
|
|
self.assertFalse(os.path.isfile(self.expected_file))
|
|
|
|
self.assertFalse(os.path.exists(self.next_part_dir))
|
|
|
|
# Relinker processes partitions in reverse order; as a result, the
|
|
|
|
# "normal" rehash during cleanup won't hit this, since it doesn't
|
|
|
|
# exist yet -- but when we finish processing the old partition,
|
|
|
|
# we'll loop back around.
|
|
|
|
with self._mock_relinker():
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file)) # link created
|
|
|
|
# old partition should be cleaned up
|
|
|
|
self.assertFalse(os.path.exists(self.part_dir))
|
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('warning'))
|
|
|
|
self.assertIn(
|
|
|
|
'Relinking (cleanup) created link: %s to %s'
|
|
|
|
% (self.objname, self.expected_file),
|
|
|
|
self.logger.get_lines_for_level('debug'))
|
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 1 linked, 1 removed, 0 errors)', info_lines)
|
|
|
|
# suffix should be invalidated and rehashed in new partition
|
2021-03-08 17:16:37 +00:00
|
|
|
hashes_invalid = os.path.join(self.next_part_dir, 'hashes.invalid')
|
|
|
|
self.assertTrue(os.path.exists(hashes_invalid))
|
|
|
|
with open(hashes_invalid, 'r') as fd:
|
2021-03-29 11:19:18 -07:00
|
|
|
self.assertEqual('', fd.read().strip())
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-08 17:16:37 +00:00
|
|
|
|
|
|
|
def test_cleanup_same_object_different_inode_in_new_partition(self):
|
|
|
|
# force rehash of new partition to not happen during cleanup
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
self._common_test_cleanup(relink=False)
|
|
|
|
# new file in the new partition but different inode
|
|
|
|
os.makedirs(self.expected_dir)
|
|
|
|
with open(self.expected_file, 'w') as fd:
|
|
|
|
fd.write('same but different')
|
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-08 17:16:37 +00:00
|
|
|
res = relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
])
|
|
|
|
|
|
|
|
self.assertEqual(1, res)
|
|
|
|
self.assertTrue(os.path.isfile(self.objname))
|
|
|
|
with open(self.objname, 'r') as fd:
|
|
|
|
self.assertEqual('Hello World!', fd.read())
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file))
|
|
|
|
with open(self.expected_file, 'r') as fd:
|
|
|
|
self.assertEqual('same but different', fd.read())
|
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
2021-01-06 16:18:04 -08:00
|
|
|
self.assertEqual(2, len(warning_lines), warning_lines)
|
2021-03-08 17:16:37 +00:00
|
|
|
self.assertIn('Error relinking (cleanup): failed to relink %s to %s'
|
|
|
|
% (self.objname, self.expected_file), warning_lines[0])
|
|
|
|
# suffix should not be invalidated in new partition
|
|
|
|
hashes_invalid = os.path.join(self.next_part_dir, 'hashes.invalid')
|
|
|
|
self.assertFalse(os.path.exists(hashes_invalid))
|
2021-01-06 16:18:04 -08:00
|
|
|
self.assertEqual('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 0 linked, 0 removed, 1 errors)',
|
|
|
|
warning_lines[1])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-08 17:16:37 +00:00
|
|
|
|
|
|
|
def test_cleanup_older_object_in_new_partition(self):
|
|
|
|
# relink of the current object failed, but there is an older version of
|
|
|
|
# same object in the new partition
|
|
|
|
# force rehash of new partition to not happen during cleanup
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
self._common_test_cleanup(relink=False)
|
|
|
|
os.makedirs(self.expected_dir)
|
|
|
|
older_obj_file = os.path.join(
|
|
|
|
self.expected_dir,
|
|
|
|
utils.Timestamp(int(self.obj_ts) - 1).internal + '.data')
|
|
|
|
with open(older_obj_file, "wb") as fd:
|
|
|
|
fd.write(b"Hello Olde Worlde!")
|
|
|
|
write_metadata(fd, {'name': self.obj_path, 'Content-Length': '18'})
|
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-08 17:16:37 +00:00
|
|
|
res = relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
])
|
|
|
|
|
|
|
|
self.assertEqual(0, res)
|
2021-03-11 21:03:01 +00:00
|
|
|
# old partition should be cleaned up
|
|
|
|
self.assertFalse(os.path.exists(self.part_dir))
|
2021-03-29 11:19:18 -07:00
|
|
|
# which is also going to clean up the older file
|
|
|
|
self.assertFalse(os.path.isfile(older_obj_file))
|
2021-03-08 17:16:37 +00:00
|
|
|
self.assertTrue(os.path.isfile(self.expected_file)) # link created
|
|
|
|
self.assertIn(
|
|
|
|
'Relinking (cleanup) created link: %s to %s'
|
|
|
|
% (self.objname, self.expected_file),
|
|
|
|
self.logger.get_lines_for_level('debug'))
|
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('warning'))
|
2021-03-09 17:44:26 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 1 linked, 1 removed, 0 errors)', info_lines)
|
2021-03-29 11:19:18 -07:00
|
|
|
# suffix should be invalidated and rehashed in new partition
|
2021-03-08 17:16:37 +00:00
|
|
|
hashes_invalid = os.path.join(self.next_part_dir, 'hashes.invalid')
|
|
|
|
self.assertTrue(os.path.exists(hashes_invalid))
|
|
|
|
with open(hashes_invalid, 'r') as fd:
|
2021-03-29 11:19:18 -07:00
|
|
|
self.assertEqual('', fd.read().strip())
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2016-07-04 18:21:54 +02:00
|
|
|
|
|
|
|
def test_cleanup_deleted(self):
|
2021-03-08 17:16:37 +00:00
|
|
|
# force rehash of new partition to not happen during cleanup
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
2016-07-04 18:21:54 +02:00
|
|
|
self._common_test_cleanup()
|
2021-03-08 17:16:37 +00:00
|
|
|
# rehash during relink creates hashes.invalid...
|
|
|
|
hashes_invalid = os.path.join(self.next_part_dir, 'hashes.invalid')
|
|
|
|
self.assertTrue(os.path.exists(hashes_invalid))
|
2016-07-04 18:21:54 +02:00
|
|
|
|
2021-03-04 12:12:20 -08:00
|
|
|
# Pretend the object got deleted in between and there is a tombstone
|
2021-03-08 17:16:37 +00:00
|
|
|
# note: the tombstone would normally be at a newer timestamp but here
|
|
|
|
# we make the tombstone at same timestamp - it is treated as the
|
|
|
|
# 'required' file in the new partition, so the .data is deleted in the
|
|
|
|
# old partition
|
2016-07-04 18:21:54 +02:00
|
|
|
fname_ts = self.expected_file[:-4] + "ts"
|
|
|
|
os.rename(self.expected_file, fname_ts)
|
2021-03-09 17:44:26 +00:00
|
|
|
self.assertTrue(os.path.isfile(fname_ts))
|
2016-07-04 18:21:54 +02:00
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-09 17:44:26 +00:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
2021-03-08 17:16:37 +00:00
|
|
|
self.assertTrue(os.path.isfile(fname_ts))
|
|
|
|
# old partition should be cleaned up
|
|
|
|
self.assertFalse(os.path.exists(self.part_dir))
|
|
|
|
# suffix should not be invalidated in new partition
|
|
|
|
self.assertTrue(os.path.exists(hashes_invalid))
|
|
|
|
with open(hashes_invalid, 'r') as fd:
|
|
|
|
self.assertEqual('', fd.read().strip())
|
2021-03-09 17:44:26 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(0 files, 0 linked, 1 removed, 0 errors)', info_lines)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2016-07-04 18:21:54 +02:00
|
|
|
|
2021-05-07 14:03:18 -07:00
|
|
|
def test_cleanup_old_part_careful_file(self):
|
2021-03-18 12:26:08 -07:00
|
|
|
self._common_test_cleanup()
|
|
|
|
# make some extra junk file in the part
|
|
|
|
extra_file = os.path.join(self.part_dir, 'extra')
|
|
|
|
with open(extra_file, 'w'):
|
|
|
|
pass
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
2021-03-18 12:26:08 -07:00
|
|
|
# old partition can't be cleaned up
|
|
|
|
self.assertTrue(os.path.exists(self.part_dir))
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-18 12:26:08 -07:00
|
|
|
|
2021-05-07 14:03:18 -07:00
|
|
|
def test_cleanup_old_part_careful_dir(self):
|
|
|
|
self._common_test_cleanup()
|
|
|
|
# make some extra junk directory in the part
|
|
|
|
extra_dir = os.path.join(self.part_dir, 'extra')
|
|
|
|
os.mkdir(extra_dir)
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
# old partition can't be cleaned up
|
|
|
|
self.assertTrue(os.path.exists(self.part_dir))
|
|
|
|
self.assertTrue(os.path.exists(extra_dir))
|
|
|
|
|
|
|
|
def test_cleanup_old_part_replication_lock_taken(self):
|
|
|
|
# verify that relinker must take the replication lock before deleting
|
|
|
|
# it, and handles the LockTimeout when unable to take it
|
|
|
|
self._common_test_cleanup()
|
|
|
|
|
|
|
|
config = """
|
|
|
|
[DEFAULT]
|
|
|
|
swift_dir = %s
|
|
|
|
devices = %s
|
|
|
|
mount_check = false
|
|
|
|
replication_lock_timeout = 1
|
|
|
|
|
|
|
|
[object-relinker]
|
|
|
|
""" % (self.testdir, self.devices)
|
|
|
|
conf_file = os.path.join(self.testdir, 'relinker.conf')
|
|
|
|
with open(conf_file, 'w') as f:
|
|
|
|
f.write(dedent(config))
|
|
|
|
|
|
|
|
with utils.lock_path(self.part_dir, name='replication'):
|
|
|
|
# lock taken so relinker should be unable to remove the lock file
|
|
|
|
with self._mock_relinker():
|
|
|
|
self.assertEqual(0, relinker.main(['cleanup', conf_file]))
|
|
|
|
# old partition can't be cleaned up
|
|
|
|
self.assertTrue(os.path.exists(self.part_dir))
|
|
|
|
self.assertTrue(os.path.exists(
|
|
|
|
os.path.join(self.part_dir, '.lock-replication')))
|
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
|
|
|
|
|
|
|
def test_cleanup_old_part_partition_lock_taken_during_get_hashes(self):
|
|
|
|
# verify that relinker handles LockTimeouts when rehashing
|
|
|
|
self._common_test_cleanup()
|
|
|
|
|
|
|
|
config = """
|
|
|
|
[DEFAULT]
|
|
|
|
swift_dir = %s
|
|
|
|
devices = %s
|
|
|
|
mount_check = false
|
|
|
|
replication_lock_timeout = 1
|
|
|
|
|
|
|
|
[object-relinker]
|
|
|
|
""" % (self.testdir, self.devices)
|
|
|
|
conf_file = os.path.join(self.testdir, 'relinker.conf')
|
|
|
|
with open(conf_file, 'w') as f:
|
|
|
|
f.write(dedent(config))
|
|
|
|
|
|
|
|
orig_get_hashes = BaseDiskFileManager.get_hashes
|
|
|
|
|
|
|
|
def new_get_hashes(*args, **kwargs):
|
|
|
|
# lock taken so relinker should be unable to rehash
|
|
|
|
with utils.lock_path(self.part_dir):
|
|
|
|
return orig_get_hashes(*args, **kwargs)
|
|
|
|
|
|
|
|
with self._mock_relinker(), \
|
|
|
|
mock.patch('swift.common.utils.DEFAULT_LOCK_TIMEOUT', 0.1), \
|
|
|
|
mock.patch.object(BaseDiskFileManager,
|
|
|
|
'get_hashes', new_get_hashes):
|
|
|
|
self.assertEqual(0, relinker.main(['cleanup', conf_file]))
|
|
|
|
# old partition can't be cleaned up
|
|
|
|
self.assertTrue(os.path.exists(self.part_dir))
|
|
|
|
self.assertTrue(os.path.exists(
|
|
|
|
os.path.join(self.part_dir, '.lock')))
|
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('warning'))
|
|
|
|
|
|
|
|
def test_cleanup_old_part_lock_taken_between_get_hashes_and_rm(self):
|
|
|
|
# verify that relinker must take the partition lock before deleting
|
|
|
|
# it, and handles the LockTimeout when unable to take it
|
|
|
|
self._common_test_cleanup()
|
|
|
|
|
|
|
|
config = """
|
|
|
|
[DEFAULT]
|
|
|
|
swift_dir = %s
|
|
|
|
devices = %s
|
|
|
|
mount_check = false
|
|
|
|
replication_lock_timeout = 1
|
|
|
|
|
|
|
|
[object-relinker]
|
|
|
|
""" % (self.testdir, self.devices)
|
|
|
|
conf_file = os.path.join(self.testdir, 'relinker.conf')
|
|
|
|
with open(conf_file, 'w') as f:
|
|
|
|
f.write(dedent(config))
|
|
|
|
|
|
|
|
orig_replication_lock = BaseDiskFileManager.replication_lock
|
|
|
|
|
|
|
|
@contextmanager
|
|
|
|
def new_lock(*args, **kwargs):
|
|
|
|
# lock taken so relinker should be unable to rehash
|
|
|
|
with utils.lock_path(self.part_dir):
|
|
|
|
with orig_replication_lock(*args, **kwargs) as cm:
|
|
|
|
yield cm
|
|
|
|
|
|
|
|
with self._mock_relinker(), \
|
|
|
|
mock.patch('swift.common.utils.DEFAULT_LOCK_TIMEOUT', 0.1), \
|
|
|
|
mock.patch.object(BaseDiskFileManager,
|
|
|
|
'replication_lock', new_lock):
|
|
|
|
self.assertEqual(0, relinker.main(['cleanup', conf_file]))
|
|
|
|
# old partition can't be cleaned up
|
|
|
|
self.assertTrue(os.path.exists(self.part_dir))
|
|
|
|
self.assertTrue(os.path.exists(
|
|
|
|
os.path.join(self.part_dir, '.lock')))
|
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('warning'))
|
|
|
|
|
2021-03-18 12:26:08 -07:00
|
|
|
def test_cleanup_old_part_robust(self):
|
|
|
|
self._common_test_cleanup()
|
|
|
|
|
|
|
|
orig_get_hashes = DiskFileManager.get_hashes
|
|
|
|
calls = []
|
|
|
|
|
|
|
|
def mock_get_hashes(mgr, device, part, suffixes, policy):
|
|
|
|
orig_resp = orig_get_hashes(mgr, device, part, suffixes, policy)
|
|
|
|
if part == self.part:
|
|
|
|
expected_files = ['.lock', 'hashes.pkl', 'hashes.invalid']
|
|
|
|
self.assertEqual(set(expected_files),
|
|
|
|
set(os.listdir(self.part_dir)))
|
|
|
|
# unlink a random file, should be empty
|
|
|
|
os.unlink(os.path.join(self.part_dir, 'hashes.pkl'))
|
2021-05-07 14:03:18 -07:00
|
|
|
# create an ssync replication lock, too
|
|
|
|
with open(os.path.join(self.part_dir,
|
|
|
|
'.lock-replication'), 'w'):
|
|
|
|
pass
|
2021-03-18 12:26:08 -07:00
|
|
|
calls.append(True)
|
|
|
|
elif part == self.next_part:
|
|
|
|
# sometimes our random obj needs to rehash the next part too
|
|
|
|
pass
|
|
|
|
else:
|
|
|
|
self.fail('Unexpected call to get_hashes for %r' % part)
|
|
|
|
return orig_resp
|
|
|
|
|
|
|
|
with mock.patch.object(DiskFileManager, 'get_hashes', mock_get_hashes):
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
2021-03-18 12:26:08 -07:00
|
|
|
self.assertEqual([True], calls)
|
|
|
|
# old partition can still be cleaned up
|
|
|
|
self.assertFalse(os.path.exists(self.part_dir))
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-18 12:26:08 -07:00
|
|
|
|
2021-03-04 12:12:20 -08:00
|
|
|
def test_cleanup_reapable(self):
|
|
|
|
# relink a tombstone
|
|
|
|
fname_ts = self.objname[:-4] + "ts"
|
|
|
|
os.rename(self.objname, fname_ts)
|
|
|
|
self.objname = fname_ts
|
|
|
|
self.expected_file = self.expected_file[:-4] + "ts"
|
|
|
|
self._common_test_cleanup()
|
|
|
|
self.assertTrue(os.path.exists(self.expected_file)) # sanity check
|
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker(), \
|
|
|
|
mock.patch('time.time', return_value=1e10 - 1): # far future
|
2021-03-04 12:12:20 -08:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
self.assertEqual(self.logger.get_lines_for_level('error'), [])
|
|
|
|
self.assertEqual(self.logger.get_lines_for_level('warning'), [])
|
2021-03-08 17:16:37 +00:00
|
|
|
# reclaimed during relinker cleanup...
|
2021-03-04 12:12:20 -08:00
|
|
|
self.assertFalse(os.path.exists(self.objname))
|
2021-03-08 17:16:37 +00:00
|
|
|
# reclaimed during relinker relink or relinker cleanup, depending on
|
|
|
|
# which quartile the partition is in ...
|
|
|
|
self.assertFalse(os.path.exists(self.expected_file))
|
2021-03-04 12:12:20 -08:00
|
|
|
|
2021-03-08 17:16:37 +00:00
|
|
|
def test_cleanup_new_does_not_exist(self):
|
2016-07-04 18:21:54 +02:00
|
|
|
self._common_test_cleanup()
|
2021-03-08 17:16:37 +00:00
|
|
|
# Pretend the file in the new place got deleted in between relink and
|
|
|
|
# cleanup: cleanup should re-create the link
|
2016-07-04 18:21:54 +02:00
|
|
|
os.remove(self.expected_file)
|
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-08 17:16:37 +00:00
|
|
|
self.assertEqual(0, relinker.main([
|
2021-02-01 16:40:21 -08:00
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
2021-03-08 17:16:37 +00:00
|
|
|
self.assertTrue(os.path.isfile(self.expected_file)) # link created
|
2021-03-11 21:03:01 +00:00
|
|
|
# old partition should be cleaned up
|
|
|
|
self.assertFalse(os.path.exists(self.part_dir))
|
2021-03-08 17:16:37 +00:00
|
|
|
self.assertIn(
|
|
|
|
'Relinking (cleanup) created link: %s to %s'
|
|
|
|
% (self.objname, self.expected_file),
|
|
|
|
self.logger.get_lines_for_level('debug'))
|
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('warning'))
|
2021-03-09 17:44:26 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 1 linked, 1 removed, 0 errors)', info_lines)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-08 17:16:37 +00:00
|
|
|
|
|
|
|
def test_cleanup_new_does_not_exist_and_relink_fails(self):
|
|
|
|
# force rehash of new partition to not happen during cleanup
|
|
|
|
self._setup_object(lambda part: part >= 2 ** (PART_POWER - 1))
|
|
|
|
self._common_test_cleanup()
|
|
|
|
# rehash during relink creates hashes.invalid...
|
|
|
|
hashes_invalid = os.path.join(self.next_part_dir, 'hashes.invalid')
|
|
|
|
self.assertTrue(os.path.exists(hashes_invalid))
|
|
|
|
# Pretend the file in the new place got deleted in between relink and
|
|
|
|
# cleanup: cleanup attempts to re-create the link but fails
|
|
|
|
os.remove(self.expected_file)
|
|
|
|
|
|
|
|
with mock.patch('swift.obj.diskfile.os.link', side_effect=OSError):
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-08 17:16:37 +00:00
|
|
|
self.assertEqual(1, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
self.assertFalse(os.path.isfile(self.expected_file))
|
|
|
|
self.assertTrue(os.path.isfile(self.objname)) # old file intact
|
2021-01-06 16:18:04 -08:00
|
|
|
self.assertEqual(self.logger.get_lines_for_level('warning'), [
|
|
|
|
'Error relinking (cleanup): failed to relink %s to %s: '
|
|
|
|
% (self.objname, self.expected_file),
|
|
|
|
'1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 0 linked, 0 removed, 1 errors)',
|
|
|
|
])
|
2021-03-08 17:16:37 +00:00
|
|
|
# suffix should not be invalidated in new partition
|
|
|
|
self.assertTrue(os.path.exists(hashes_invalid))
|
|
|
|
with open(hashes_invalid, 'r') as fd:
|
|
|
|
self.assertEqual('', fd.read().strip())
|
|
|
|
# nor in the old partition
|
|
|
|
old_hashes_invalid = os.path.join(self.part_dir, 'hashes.invalid')
|
|
|
|
self.assertFalse(os.path.exists(old_hashes_invalid))
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2016-07-04 18:21:54 +02:00
|
|
|
|
2021-03-11 21:03:01 +00:00
|
|
|
def test_cleanup_remove_fails(self):
|
|
|
|
meta_file = utils.Timestamp(int(self.obj_ts) + 1).internal + '.meta'
|
|
|
|
old_meta_path = os.path.join(self.objdir, meta_file)
|
|
|
|
new_meta_path = os.path.join(self.expected_dir, meta_file)
|
|
|
|
|
|
|
|
with open(old_meta_path, 'w') as fd:
|
2021-03-09 17:44:26 +00:00
|
|
|
fd.write('meta file in old partition')
|
2021-03-11 21:03:01 +00:00
|
|
|
self._common_test_cleanup()
|
|
|
|
|
|
|
|
calls = []
|
|
|
|
orig_remove = os.remove
|
|
|
|
|
|
|
|
def mock_remove(path, *args, **kwargs):
|
|
|
|
calls.append(path)
|
|
|
|
if len(calls) == 1:
|
|
|
|
raise OSError
|
|
|
|
return orig_remove(path)
|
|
|
|
|
|
|
|
with mock.patch('swift.obj.diskfile.os.remove', mock_remove):
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-11 21:03:01 +00:00
|
|
|
self.assertEqual(1, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
self.assertEqual([old_meta_path, self.objname], calls)
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file)) # new file intact
|
|
|
|
self.assertTrue(os.path.isfile(new_meta_path)) # new file intact
|
|
|
|
self.assertFalse(os.path.isfile(self.objname)) # old file removed
|
|
|
|
self.assertTrue(os.path.isfile(old_meta_path)) # meta file remove fail
|
2021-01-06 16:18:04 -08:00
|
|
|
self.assertEqual(self.logger.get_lines_for_level('warning'), [
|
|
|
|
'Error cleaning up %s: OSError()' % old_meta_path,
|
|
|
|
'1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 0 linked, 1 removed, 1 errors)',
|
|
|
|
])
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-11 21:03:01 +00:00
|
|
|
|
|
|
|
def test_cleanup_two_files_need_linking(self):
|
|
|
|
meta_file = utils.Timestamp(int(self.obj_ts) + 1).internal + '.meta'
|
|
|
|
old_meta_path = os.path.join(self.objdir, meta_file)
|
|
|
|
new_meta_path = os.path.join(self.expected_dir, meta_file)
|
|
|
|
|
|
|
|
with open(old_meta_path, 'w') as fd:
|
|
|
|
fd.write('unexpected file in old partition')
|
|
|
|
self._common_test_cleanup(relink=False)
|
|
|
|
self.assertFalse(os.path.isfile(self.expected_file)) # link missing
|
|
|
|
self.assertFalse(os.path.isfile(new_meta_path)) # link missing
|
|
|
|
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-11 21:03:01 +00:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file)) # new file created
|
|
|
|
self.assertTrue(os.path.isfile(new_meta_path)) # new file created
|
|
|
|
self.assertFalse(os.path.isfile(self.objname)) # old file removed
|
|
|
|
self.assertFalse(os.path.isfile(old_meta_path)) # meta file removed
|
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('warning'))
|
2021-03-09 17:44:26 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(2 files, 2 linked, 2 removed, 0 errors)', info_lines)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-11 21:03:01 +00:00
|
|
|
|
2018-06-29 16:13:05 +02:00
|
|
|
@patch_policies(
|
|
|
|
[ECStoragePolicy(
|
2021-01-29 12:43:54 -08:00
|
|
|
0, name='platinum', is_default=True, ec_type=DEFAULT_TEST_EC_TYPE,
|
2018-06-29 16:13:05 +02:00
|
|
|
ec_ndata=4, ec_nparity=2)])
|
2021-02-01 16:40:21 -08:00
|
|
|
def test_cleanup_diskfile_error(self):
|
2021-03-11 21:03:01 +00:00
|
|
|
# Switch the policy type so all fragments raise DiskFileError: they
|
|
|
|
# are included in the diskfile data as 'unexpected' files and cleanup
|
|
|
|
# should include them
|
2021-03-09 17:44:26 +00:00
|
|
|
self._common_test_cleanup()
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-02-01 16:40:21 -08:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
log_lines = self.logger.get_lines_for_level('warning')
|
2021-03-09 17:44:26 +00:00
|
|
|
# The error is logged six times:
|
|
|
|
# during _common_test_cleanup() relink: once for cleanup_ondisk_files
|
|
|
|
# in old and once for get_ondisk_files of union of files;
|
|
|
|
# during cleanup: once for cleanup_ondisk_files in old and new
|
|
|
|
# location, once for get_ondisk_files of union of files;
|
|
|
|
# during either relink or cleanup: once for the rehash of the new
|
2021-03-11 21:03:01 +00:00
|
|
|
# partition
|
2021-03-09 17:44:26 +00:00
|
|
|
self.assertEqual(6, len(log_lines),
|
|
|
|
'Expected 6 log lines, got %r' % log_lines)
|
2021-03-08 17:16:37 +00:00
|
|
|
for line in log_lines:
|
|
|
|
self.assertIn('Bad fragment index: None', line, log_lines)
|
2021-03-11 21:03:01 +00:00
|
|
|
self.assertTrue(os.path.isfile(self.expected_file)) # new file intact
|
|
|
|
# old partition should be cleaned up
|
|
|
|
self.assertFalse(os.path.exists(self.part_dir))
|
2021-03-09 17:44:26 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 0 linked, 1 removed, 0 errors)', info_lines)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2021-03-11 21:03:01 +00:00
|
|
|
|
|
|
|
@patch_policies(
|
|
|
|
[ECStoragePolicy(
|
|
|
|
0, name='platinum', is_default=True, ec_type=DEFAULT_TEST_EC_TYPE,
|
|
|
|
ec_ndata=4, ec_nparity=2)])
|
|
|
|
def test_cleanup_diskfile_error_new_file_missing(self):
|
|
|
|
self._common_test_cleanup(relink=False)
|
|
|
|
# Switch the policy type so all fragments raise DiskFileError: they
|
|
|
|
# are included in the diskfile data as 'unexpected' files and cleanup
|
|
|
|
# should include them
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
2021-03-11 21:03:01 +00:00
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
warning_lines = self.logger.get_lines_for_level('warning')
|
2021-03-29 11:19:18 -07:00
|
|
|
# once for cleanup_ondisk_files in old, again for the get_ondisk_files
|
|
|
|
# of union of files, and one last time when the new partition gets
|
|
|
|
# rehashed at the end of processing the old one
|
|
|
|
self.assertEqual(3, len(warning_lines),
|
|
|
|
'Expected 3 log lines, got %r' % warning_lines)
|
2021-03-11 21:03:01 +00:00
|
|
|
for line in warning_lines:
|
|
|
|
self.assertIn('Bad fragment index: None', line, warning_lines)
|
|
|
|
self.assertIn(
|
|
|
|
'Relinking (cleanup) created link: %s to %s'
|
|
|
|
% (self.objname, self.expected_file),
|
|
|
|
self.logger.get_lines_for_level('debug'))
|
|
|
|
self.assertTrue(os.path.isfile(self.expected_file)) # new file intact
|
|
|
|
# old partition should be cleaned up
|
|
|
|
self.assertFalse(os.path.exists(self.part_dir))
|
2021-03-09 17:44:26 +00:00
|
|
|
info_lines = self.logger.get_lines_for_level('info')
|
|
|
|
self.assertIn('1 hash dirs processed (cleanup=True) '
|
|
|
|
'(1 files, 1 linked, 1 removed, 0 errors)', info_lines)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
2019-11-14 16:26:48 -05:00
|
|
|
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
def test_rehashing(self):
|
|
|
|
calls = []
|
|
|
|
|
|
|
|
@contextmanager
|
|
|
|
def do_mocks():
|
|
|
|
orig_invalidate = relinker.diskfile.invalidate_hash
|
|
|
|
orig_get_hashes = DiskFileManager.get_hashes
|
|
|
|
|
|
|
|
def mock_invalidate(suffix_dir):
|
|
|
|
calls.append(('invalidate', suffix_dir))
|
|
|
|
return orig_invalidate(suffix_dir)
|
|
|
|
|
|
|
|
def mock_get_hashes(self, *args):
|
|
|
|
calls.append(('get_hashes', ) + args)
|
|
|
|
return orig_get_hashes(self, *args)
|
|
|
|
|
|
|
|
with mock.patch.object(relinker.diskfile, 'invalidate_hash',
|
|
|
|
mock_invalidate), \
|
|
|
|
mock.patch.object(DiskFileManager, 'get_hashes',
|
|
|
|
mock_get_hashes):
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
with self._mock_relinker():
|
|
|
|
yield
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
|
|
|
|
with do_mocks():
|
|
|
|
self.rb.prepare_increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'relink',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
2021-02-13 09:07:43 -06:00
|
|
|
expected = [('invalidate', self.next_suffix_dir)]
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
if self.part >= 2 ** (PART_POWER - 1):
|
2021-03-29 11:19:18 -07:00
|
|
|
expected.append(('get_hashes', self.existing_device,
|
|
|
|
self.next_part, [], POLICIES[0]))
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
|
|
|
|
self.assertEqual(calls, expected)
|
|
|
|
# Depending on partition, there may or may not be a get_hashes here
|
|
|
|
self.rb._ring = None # Force builder to reload ring
|
|
|
|
self.rb.increase_partition_power()
|
|
|
|
self._save_ring()
|
|
|
|
self.assertEqual(0, relinker.main([
|
|
|
|
'cleanup',
|
|
|
|
'--swift-dir', self.testdir,
|
|
|
|
'--devices', self.devices,
|
|
|
|
'--skip-mount',
|
|
|
|
]))
|
|
|
|
if self.part < 2 ** (PART_POWER - 1):
|
|
|
|
expected.append(('get_hashes', self.existing_device,
|
|
|
|
self.next_part, [], POLICIES[0]))
|
|
|
|
expected.extend([
|
2021-02-13 09:07:43 -06:00
|
|
|
('invalidate', self.suffix_dir),
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
('get_hashes', self.existing_device, self.part, [],
|
|
|
|
POLICIES[0]),
|
|
|
|
])
|
|
|
|
self.assertEqual(calls, expected)
|
relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
GET /recon/relinker
To gather get live relinking progress data:
$ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
{
"devices": {
"sdb3": {
"parts_done": 523,
"policies": {
"1": {
"next_part_power": 11,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 1630,
"hash_dirs": 1630,
"linked": 1630,
"policies": 1,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 1029,
"total_time": 5.400741815567017
}},
"start_time": 1618998724.845946,
"stats": {
"errors": 0,
"files": 836,
"hash_dirs": 836,
"linked": 836,
"removed": 0
},
"timestamp": 1618998730.24672,
"total_parts": 523,
"total_time": 5.400741815567017
},
"sdb7": {
"parts_done": 506,
"policies": {
"1": {
"next_part_power": 11,
"part_power": 10,
"parts_done": 506,
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"step": "relink",
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"start_time": 1618998724.845616,
"stats": {
"errors": 0,
"files": 794,
"hash_dirs": 794,
"linked": 794,
"removed": 0
},
"timestamp": 1618998730.166175,
"total_parts": 506,
"total_time": 5.320528984069824
}
},
"workers": {
"100": {
"drives": ["sda1"],
"return_code": 0,
"timestamp": 1618998730.166175}
}}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-04-27 13:56:39 +10:00
|
|
|
self.assertEqual([], self.logger.get_lines_for_level('error'))
|
relinker: Rehash as we complete partitions
Previously, we would rely on replication/reconstruction to rehash
once the part power increase was complete. This would lead to large
I/O spikes if operators didn't proactively tune down replication.
Now, do the rehashing in the relinker as it completes work. Operators
should already be mindful of the relinker's I/O usage and are likely
limiting it through some combination of cgroups, ionice, and/or
--files-per-second.
Also clean up empty partitions as we clean up. Operators with metrics on
cluster-wide primary/handoff partition counts can use them to monitor
relinking/cleanup progress:
P 3N .----------.
a / '.
r / '.
t 2N / .-----------------
i / |
t / |
i N -----'---------'
o
n
s 0 ----------------------------------
t0 t1 t2 t3 t4 t5
At t0, prior to relinking, there are
N := <replica count> * 2 ** <old part power>
primary partitions throughout the cluster and a negligible number of
handoffs.
At t1, relinking begins. In any non-trivial, low-replica-count cluster,
the probability of a new-part-power partition being assigned to the same
device as its old-part-power equivalent is low, so handoffs grow while
primaries remain constant.
At t2, relinking is complete. The total number of partitions is now 3N
(N primaries + 2N handoffs).
At t3, the ring with increased part power is distributed. The notion of
what's a handoff and what's a primary inverts.
At t4, cleanup begins. The "handoffs" are cleaned up as hard links and
now-empty partitions are removed.
At t5, cleanup is complete and there are now 2N total partitions.
Change-Id: Ib5bf426cf38559091917f2d25f4f60183cd16354
2021-02-02 15:16:43 -08:00
|
|
|
|
2019-11-14 16:26:48 -05:00
|
|
|
|
|
|
|
if __name__ == '__main__':
|
|
|
|
unittest.main()
|