10542d00ea
Curly quotes(Chinese punctuation) usually input from Chinese input method. When read from english context, it makes some confusion. Change-Id: Ibd50299ee287c56ec4759ea8ff53d47d006144f8
331 lines
16 KiB
ReStructuredText
331 lines
16 KiB
ReStructuredText
==================
|
|
Server maintenance
|
|
==================
|
|
|
|
General assumptions
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
- It is assumed that anyone attempting to replace hardware components
|
|
will have already read and understood the appropriate maintenance and
|
|
service guides.
|
|
|
|
- It is assumed that where servers need to be taken off-line for
|
|
hardware replacement, that this will be done in series, bringing the
|
|
server back on-line before taking the next off-line.
|
|
|
|
- It is assumed that the operations directed procedure will be used for
|
|
identifying hardware for replacement.
|
|
|
|
Assessing the health of swift
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
You can run the swift-recon tool on a Swift proxy node to get a quick
|
|
check of how Swift is doing. Please note that the numbers below are
|
|
necessarily somewhat subjective. Sometimes parameters for which we
|
|
say 'low values are good' will have pretty high values for a time. Often
|
|
if you wait a while things get better.
|
|
|
|
For example:
|
|
|
|
.. code::
|
|
|
|
sudo swift-recon -rla
|
|
===============================================================================
|
|
[2012-03-10 12:57:21] Checking async pendings on 384 hosts...
|
|
Async stats: low: 0, high: 1, avg: 0, total: 1
|
|
===============================================================================
|
|
|
|
[2012-03-10 12:57:22] Checking replication times on 384 hosts...
|
|
[Replication Times] shortest: 1.4113877813, longest: 36.8293570836, avg: 4.86278064749
|
|
===============================================================================
|
|
|
|
[2012-03-10 12:57:22] Checking load avg's on 384 hosts...
|
|
[5m load average] lowest: 2.22, highest: 9.5, avg: 4.59578125
|
|
[15m load average] lowest: 2.36, highest: 9.45, avg: 4.62622395833
|
|
[1m load average] lowest: 1.84, highest: 9.57, avg: 4.5696875
|
|
===============================================================================
|
|
|
|
In the example above we ask for information on replication times (-r),
|
|
load averages (-l) and async pendings (-a). This is a healthy Swift
|
|
system. Rules-of-thumb for 'good' recon output are:
|
|
|
|
- Nodes that respond are up and running Swift. If all nodes respond,
|
|
that is a good sign. But some nodes may time out. For example:
|
|
|
|
.. code::
|
|
|
|
-> [http://<redacted>.29:6200/recon/load:] <urlopen error [Errno 111] ECONNREFUSED>
|
|
-> [http://<redacted>.31:6200/recon/load:] <urlopen error timed out>
|
|
|
|
- That could be okay or could require investigation.
|
|
|
|
- Low values (say < 10 for high and average) for async pendings are
|
|
good. Higher values occur when disks are down and/or when the system
|
|
is heavily loaded. Many simultaneous PUTs to the same container can
|
|
drive async pendings up. This may be normal, and may resolve itself
|
|
after a while. If it persists, one way to track down the problem is
|
|
to find a node with high async pendings (with ``swift-recon -av | sort
|
|
-n -k4``), then check its Swift logs, Often async pendings are high
|
|
because a node cannot write to a container on another node. Often
|
|
this is because the node or disk is offline or bad. This may be okay
|
|
if we know about it.
|
|
|
|
- Low values for replication times are good. These values rise when new
|
|
rings are pushed, and when nodes and devices are brought back on
|
|
line.
|
|
|
|
- Our 'high' load average values are typically in the 9-15 range. If
|
|
they are a lot bigger it is worth having a look at the systems
|
|
pushing the average up. Run ``swift-recon -av`` to get the individual
|
|
averages. To sort the entries with the highest at the end,
|
|
run ``swift-recon -av | sort -n -k4``.
|
|
|
|
For comparison here is the recon output for the same system above when
|
|
two entire racks of Swift are down:
|
|
|
|
.. code::
|
|
|
|
[2012-03-10 16:56:33] Checking async pendings on 384 hosts...
|
|
-> http://<redacted>.22:6200/recon/async: <urlopen error timed out>
|
|
-> http://<redacted>.18:6200/recon/async: <urlopen error timed out>
|
|
-> http://<redacted>.16:6200/recon/async: <urlopen error timed out>
|
|
-> http://<redacted>.13:6200/recon/async: <urlopen error timed out>
|
|
-> http://<redacted>.30:6200/recon/async: <urlopen error timed out>
|
|
-> http://<redacted>.6:6200/recon/async: <urlopen error timed out>
|
|
.........
|
|
-> http://<redacted>.5:6200/recon/async: <urlopen error timed out>
|
|
-> http://<redacted>.15:6200/recon/async: <urlopen error timed out>
|
|
-> http://<redacted>.9:6200/recon/async: <urlopen error timed out>
|
|
-> http://<redacted>.27:6200/recon/async: <urlopen error timed out>
|
|
-> http://<redacted>.4:6200/recon/async: <urlopen error timed out>
|
|
-> http://<redacted>.8:6200/recon/async: <urlopen error timed out>
|
|
Async stats: low: 243, high: 659, avg: 413, total: 132275
|
|
===============================================================================
|
|
[2012-03-10 16:57:48] Checking replication times on 384 hosts...
|
|
-> http://<redacted>.22:6200/recon/replication: <urlopen error timed out>
|
|
-> http://<redacted>.18:6200/recon/replication: <urlopen error timed out>
|
|
-> http://<redacted>.16:6200/recon/replication: <urlopen error timed out>
|
|
-> http://<redacted>.13:6200/recon/replication: <urlopen error timed out>
|
|
-> http://<redacted>.30:6200/recon/replication: <urlopen error timed out>
|
|
-> http://<redacted>.6:6200/recon/replication: <urlopen error timed out>
|
|
............
|
|
-> http://<redacted>.5:6200/recon/replication: <urlopen error timed out>
|
|
-> http://<redacted>.15:6200/recon/replication: <urlopen error timed out>
|
|
-> http://<redacted>.9:6200/recon/replication: <urlopen error timed out>
|
|
-> http://<redacted>.27:6200/recon/replication: <urlopen error timed out>
|
|
-> http://<redacted>.4:6200/recon/replication: <urlopen error timed out>
|
|
-> http://<redacted>.8:6200/recon/replication: <urlopen error timed out>
|
|
[Replication Times] shortest: 1.38144306739, longest: 112.620954418, avg: 10.285
|
|
9475361
|
|
===============================================================================
|
|
[2012-03-10 16:59:03] Checking load avg's on 384 hosts...
|
|
-> http://<redacted>.22:6200/recon/load: <urlopen error timed out>
|
|
-> http://<redacted>.18:6200/recon/load: <urlopen error timed out>
|
|
-> http://<redacted>.16:6200/recon/load: <urlopen error timed out>
|
|
-> http://<redacted>.13:6200/recon/load: <urlopen error timed out>
|
|
-> http://<redacted>.30:6200/recon/load: <urlopen error timed out>
|
|
-> http://<redacted>.6:6200/recon/load: <urlopen error timed out>
|
|
............
|
|
-> http://<redacted>.15:6200/recon/load: <urlopen error timed out>
|
|
-> http://<redacted>.9:6200/recon/load: <urlopen error timed out>
|
|
-> http://<redacted>.27:6200/recon/load: <urlopen error timed out>
|
|
-> http://<redacted>.4:6200/recon/load: <urlopen error timed out>
|
|
-> http://<redacted>.8:6200/recon/load: <urlopen error timed out>
|
|
[5m load average] lowest: 1.71, highest: 4.91, avg: 2.486375
|
|
[15m load average] lowest: 1.79, highest: 5.04, avg: 2.506125
|
|
[1m load average] lowest: 1.46, highest: 4.55, avg: 2.4929375
|
|
===============================================================================
|
|
|
|
.. note::
|
|
|
|
The replication times and load averages are within reasonable
|
|
parameters, even with 80 object stores down. Async pendings, however is
|
|
quite high. This is due to the fact that the containers on the servers
|
|
which are down cannot be updated. When those servers come back up, async
|
|
pendings should drop. If async pendings were at this level without an
|
|
explanation, we have a problem.
|
|
|
|
Recon examples
|
|
~~~~~~~~~~~~~~
|
|
|
|
Here is an example of noting and tracking down a problem with recon.
|
|
|
|
Running reccon shows some async pendings:
|
|
|
|
.. code::
|
|
|
|
bob@notso:~/swift-1.4.4/swift$ ssh -q <redacted>.132.7 sudo swift-recon -alr
|
|
===============================================================================
|
|
[2012-03-14 17:25:55] Checking async pendings on 384 hosts...
|
|
Async stats: low: 0, high: 23, avg: 8, total: 3356
|
|
===============================================================================
|
|
[2012-03-14 17:25:55] Checking replication times on 384 hosts...
|
|
[Replication Times] shortest: 1.49303831657, longest: 39.6982825994, avg: 4.2418222066
|
|
===============================================================================
|
|
[2012-03-14 17:25:56] Checking load avg's on 384 hosts...
|
|
[5m load average] lowest: 2.35, highest: 8.88, avg: 4.45911458333
|
|
[15m load average] lowest: 2.41, highest: 9.11, avg: 4.504765625
|
|
[1m load average] lowest: 1.95, highest: 8.56, avg: 4.40588541667
|
|
===============================================================================
|
|
|
|
Why? Running recon again with -av swift (not shown here) tells us that
|
|
the node with the highest (23) is <redacted>.72.61. Looking at the log
|
|
files on <redacted>.72.61 we see:
|
|
|
|
.. code::
|
|
|
|
souzab@<redacted>:~$ sudo tail -f /var/log/swift/background.log | - grep -i ERROR
|
|
Mar 14 17:28:06 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.119', 'id': 5481, 'meta': '', 'device': 'disk6', 'port': 6201}
|
|
Mar 14 17:28:06 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.119', 'id': 5481, 'meta': '', 'device': 'disk6', 'port': 6201}
|
|
Mar 14 17:28:09 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6201}
|
|
Mar 14 17:28:11 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6201}
|
|
Mar 14 17:28:13 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.119', 'id': 5481, 'meta': '', 'device': 'disk6', 'port': 6201}
|
|
Mar 14 17:28:13 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.119', 'id': 5481, 'meta': '', 'device': 'disk6', 'port': 6201}
|
|
Mar 14 17:28:15 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6201}
|
|
Mar 14 17:28:15 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6201}
|
|
Mar 14 17:28:19 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6201}
|
|
Mar 14 17:28:19 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6201}
|
|
Mar 14 17:28:20 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.119', 'id': 5481, 'meta': '', 'device': 'disk6', 'port': 6201}
|
|
Mar 14 17:28:21 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6201}
|
|
Mar 14 17:28:21 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6201}
|
|
Mar 14 17:28:22 <redacted> container-replicator ERROR Remote drive not mounted
|
|
{'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6201}
|
|
|
|
That is why this node has a lot of async pendings: a bunch of disks that
|
|
are not mounted on <redacted> and <redacted>. There may be other issues,
|
|
but clearing this up will likely drop the async pendings a fair bit, as
|
|
other nodes will be having the same problem.
|
|
|
|
Assessing the availability risk when multiple storage servers are down
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. note::
|
|
|
|
This procedure will tell you if you have a problem, however, in practice
|
|
you will find that you will not use this procedure frequently.
|
|
|
|
If three storage nodes (or, more precisely, three disks on three
|
|
different storage nodes) are down, there is a small but nonzero
|
|
probability that user objects, containers, or accounts will not be
|
|
available.
|
|
|
|
Procedure
|
|
---------
|
|
|
|
.. note::
|
|
|
|
swift has three rings: one each for objects, containers and accounts.
|
|
This procedure should be run three times, each time specifying the
|
|
appropriate ``*.builder`` file.
|
|
|
|
#. Determine whether all three nodes are in different Swift zones by
|
|
running the ring builder on a proxy node to determine which zones
|
|
the storage nodes are in. For example:
|
|
|
|
.. code::
|
|
|
|
% sudo swift-ring-builder /etc/swift/object.builder
|
|
/etc/swift/object.builder, build version 1467
|
|
2097152 partitions, 3 replicas, 5 zones, 1320 devices, 0.02 balance
|
|
The minimum number of hours before a partition can be reassigned is 24
|
|
Devices: id zone ip address port name weight partitions balance meta
|
|
0 1 <redacted>.4 6200 disk0 1708.00 4259 -0.00
|
|
1 1 <redacted>.4 6200 disk1 1708.00 4260 0.02
|
|
2 1 <redacted>.4 6200 disk2 1952.00 4868 0.01
|
|
3 1 <redacted>.4 6200 disk3 1952.00 4868 0.01
|
|
4 1 <redacted>.4 6200 disk4 1952.00 4867 -0.01
|
|
|
|
#. Here, node <redacted>.4 is in zone 1. If two or more of the three
|
|
nodes under consideration are in the same Swift zone, they do not
|
|
have any ring partitions in common; there is little/no data
|
|
availability risk if all three nodes are down.
|
|
|
|
#. If the nodes are in three distinct Swift zones it is necessary to
|
|
whether the nodes have ring partitions in common. Run ``swift-ring``
|
|
builder again, this time with the ``list_parts`` option and specify
|
|
the nodes under consideration. For example:
|
|
|
|
.. code::
|
|
|
|
% sudo swift-ring-builder /etc/swift/object.builder list_parts <redacted>.8 <redacted>.15 <redacted>.72.2
|
|
Partition Matches
|
|
91 2
|
|
729 2
|
|
3754 2
|
|
3769 2
|
|
3947 2
|
|
5818 2
|
|
7918 2
|
|
8733 2
|
|
9509 2
|
|
10233 2
|
|
|
|
#. The ``list_parts`` option to the ring builder indicates how many ring
|
|
partitions the nodes have in common. If, as in this case, the
|
|
first entry in the list has a 'Matches' column of 2 or less, there
|
|
is no data availability risk if all three nodes are down.
|
|
|
|
#. If the 'Matches' column has entries equal to 3, there is some data
|
|
availability risk if all three nodes are down. The risk is generally
|
|
small, and is proportional to the number of entries that have a 3 in
|
|
the Matches column. For example:
|
|
|
|
.. code::
|
|
|
|
Partition Matches
|
|
26865 3
|
|
362367 3
|
|
745940 3
|
|
778715 3
|
|
797559 3
|
|
820295 3
|
|
822118 3
|
|
839603 3
|
|
852332 3
|
|
855965 3
|
|
858016 3
|
|
|
|
#. A quick way to count the number of rows with 3 matches is:
|
|
|
|
.. code::
|
|
|
|
% sudo swift-ring-builder /etc/swift/object.builder list_parts <redacted>.8 <redacted>.15 <redacted>.72.2 | grep "3$" | wc -l
|
|
|
|
30
|
|
|
|
#. In this case the nodes have 30 out of a total of 2097152 partitions
|
|
in common; about 0.001%. In this case the risk is small/nonzero.
|
|
Recall that a partition is simply a portion of the ring mapping
|
|
space, not actual data. So having partitions in common is a necessary
|
|
but not sufficient condition for data unavailability.
|
|
|
|
.. note::
|
|
|
|
We should not bring down a node for repair if it shows
|
|
Matches entries of 3 with other nodes that are also down.
|
|
|
|
If three nodes that have 3 partitions in common are all down, there is
|
|
a nonzero probability that data are unavailable and we should work to
|
|
bring some or all of the nodes up ASAP.
|
|
|
|
Swift startup/shutdown
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- Use reload - not stop/start/restart.
|
|
|
|
- Try to roll sets of servers (especially proxy) in groups of less
|
|
than 20% of your servers.
|