27 Commits

Author SHA1 Message Date
Tim Burke
3420921a33 Clean up HASH_PATH_* patching
Previously, we'd sometimes shove strings into HASH_PATH_PREFIX or
HASH_PATH_SUFFIX, which would blow up on py3. Now, always use bytes.

Change-Id: Icab9981e8920da505c2395eb040f8261f2da6d2e
2018-11-01 20:52:33 +00:00
Samuel Merritt
c28004deb0 Multiprocess object replicator
Add a multiprocess mode to the object replicator. Setting the
"replicator_workers" setting to a positive value N will result in the
replicator using up to N worker processes to perform replication
tasks.

At most one worker per disk will be spawned, so one can set
replicator_workers=99999999 to always get one worker per disk
regardless of the number of disks in each node. This is the same
behavior that the object reconstructor has.

Worker process logs will have a bit of information prepended so
operators can tell which messages came from which worker. It looks
like this:

  [worker 1/2 pid=16529] 154/154 (100.00%) partitions replicated in 1.02s (150.87/sec, 0s remaining)

The prefix is "[worker M/N pid=P] ", where M is the worker's index, N
is the total number of workers, and P is the process ID. Every message
from the replicator's logger will have the prefix; this includes
messages from down in diskfile, but does not include things printed to
stdout or stderr.

Drive-by fix: don't dump recon stats when replicating only certain
policies. When running the object replicator with replicator_workers >
0 and "--policies=X,Y,Z", the replicator would update recon stats
after running. Since it only ran on a subset of objects, it should not
update recon, much like it doesn't update recon when run with
--devices or --partitions.

Change-Id: I6802a9ad9f1f9b9dafb99d8b095af0fdbf174dc5
2018-04-24 04:05:08 +00:00
Samuel Merritt
dc8da5bb19 Use "poll" or "selects" Eventlet hub for all Swift daemons.
Previously, Swift's WSGI servers, the object replicator, and the
object reconstructor were setting Eventlet's hub to either "poll" or
"selects", depending on availability. Other daemons were letting
Eventlet use its default hub, which is "epoll".

In any daemons that fork, we really don't want to use epoll. Epoll
instances end up shared between the parent and all children, and you
get some awful messes when file descriptors are shared.

Here's an example where two processes are trying to wait on the same
file descriptor using the same epoll instance, and everything goes
wrong:

[proc A] epoll_ctl(6, EPOLL_CTL_ADD, 3, ...) = 0

[proc B] epoll_ctl(6, EPOLL_CTL_ADD, 3, ...) = -1 EEXIST (File exists)
[proc B] epoll_wait(6, ...) = 1
[proc B] epoll_ctl(6, EPOLL_CTL_DEL, 3, ...) = 0

[proc A] epoll_wait(6, ...)

This primarily affects the container updater and object updater since
they fork. I've decided to change the hub for all Swift daemons so
that we don't add multiprocessing support to some other daemon someday
and suffer through this same bug again.

This problem was made more apparent by commit 6d16079, which made our
logging mutex use file descriptors. However, it could have struck on
any shared file descriptor on which a read or write returned EAGAIN.

Change-Id: Ic2c1178ac918c88b0b901e581eb4fab3b2666cfe
Closes-Bug: 1722951
2017-10-12 10:45:12 -07:00
Tim Burke
cc17c99e73 Stop reloading swift.common.utils in test_daemon
This was causing some headaches over on feature/deep where a __eq__
wasn't working as expected because neither self nor other was an
instance of the class we thought we were using. Apparently, this
also fixes some issues when using fake_syslog = True?

There are two other places that we use reload_module, in
test_db_replicator and test_manager, but the monkey patching isn't
nearly as straight-forward.

Change-Id: I94d6578e275219e9687fee2f0c7cc4f99454b77f
Related-Bug: 1704192
2017-09-21 22:27:14 +00:00
Clay Gerrard
701a172afa Add multiple worker processes strategy to reconstructor
This change adds a new Strategy concept to the daemon module similar to
how we manage WSGI workers.  We need to leverage multiple python
processes to get the concurrency properties we need.  More workers will
rebalance much faster on dense chassis with many devices.

Currently the default is still only one process, and no workers.  Set
reconstructor_workers in the [object-reconstructor] section to some
whole number <= the number of devices on a node to get that many
reconstructor workers.

Each worker will operate on a different subset of disks.

Once mode works as before, but tends to want to update recon drops a
little bit more.

If you change the rings, the strategy will shutdown workers and spawn
new ones.

You can kill the worker pids and the daemon strategy will respawn them.

New per-disk reconstructor stats are dumped to recon under the
object_reconstruction_per_disk key.  To maintain legacy compatibility
and replication monitoring based on cycle times they are aggregated
every stats_interval (default 5 mins).

Change-Id: I28925a37f3985c9082b5a06e76af4dc3ec813abe
2017-07-26 16:55:10 -07:00
Tim Burke
523bc0ab71 Always set swift processes to use UTC
Previously, we would set the TZ environment variable to the result of

    time.strftime("%z", time.gmtime())

This has a few problems.

 1. The "%z" format does not appear in the table of formatting
    directives for strftime [1]. While it *does* appear in a
    footnote [2] for that section, it is described as "not supported by
    all ANSI C libraries." This may explain the next point.

 2. On the handful of Linux platforms I've tested, the above produces
    "+0000" regardless of the system's timezone. This seems to run
    counter to the intent of the patches that introduced the TZ
    mangling. (See the first two related changes.)

 3. The above does not produce a valid Posix TZ format, which expects
    (at minimum) a name consisting of three or more alphabetic
    characters followed by the offset to be added to the local time to
    get Coordinated Universal Time (UTC).

Further, while we would change os.environ['TZ'], we would *not* call
time.tzset like it says in the docs [3], which seems like a Bad Thing.

Some combination of the above has the net effect of changing some of the
functions in the time module to use UTC. (Maybe all of them? At the very
least, time.localtime and time.mktime.) However, it does *not* change
the offset stored in time.timezone, which causes bad behavior when
dealing with local timestamps [4].

Now, set TZ to "UTC+0" and call tzset. Apparently we don't have a good
way of getting local timezone info, we were (unintentionally?) using UTC
before, and you should probably be running your servers in UTC anyway.

[1] https://docs.python.org/2/library/time.html#time.strftime
[2] https://docs.python.org/2/library/time.html#id2
[3] https://docs.python.org/2/library/time.html#time.tzset
[4] Like in email.utils.mktime_tz, prior to being fixed in
    https://hg.python.org/cpython/rev/a283563c8cc4

Change-Id: I007425301914144e228b9cfece5533443e851b6e
Related-Change: Ifc78236a99ed193a42389e383d062b38f57a5a31
Related-Change: I8ec80202789707f723abfe93ccc9cf1e677e4dc6
Related-Change: Iee7488d03ab404072d3d0c1a262f004bb0f2da26
2016-12-19 16:23:13 -08:00
Christian Hugo
ffd5194a3b Raise ValueError if a config section does not exist
Instead of printing the error message and
calling sys.exit() when a section not exists
or reading the file failed rais an Exception
from readconfig. Depending on the Value or IO-Error,
the caller can decide if he wants to exit or continue.
If an Exception reaches the wsgi utilities
it bubbles all the way up.

Change-Id: Ieb444f8c34e37f49bea21c3caf1c6c2d7bee5fb4
Closes-Bug: 1578321
2016-12-15 19:49:57 +00:00
Clay Gerrard
c2ce92acd6 Fix signal handling for daemons with InternalClient
The intentional use of "bare except" handling in catch_errors and some
daemons to prevent propagation on unexpected errors that do not
inherit from Exception (like eventlet.Timeout) or even BaseException
(like old-style classes) has the side effect of spuriously "handling"
*expected* errors like when a signal handler raises SystemExit.

The signal handler installed in our Daemon is intended to ensure first
that the entire process group and any forked processes (like rsync's)
receive the SIGTERM signal and also that the process itself
terminates.

The use of sys.exit was not a concious grandiose plans for graceful
shutdown (like the running[0] = False trick that wsgi server parent
process do) - the desired behavior for SIGTERM is to stop - hard.

This change ensures the original goals and intentions of our signal
handler are fulfilled without the undesirable side effect that can
cause our daemons to confusingly log an expected message to stop as an
unexpected error, and start ignoring additional SIGTERM messages;
forcing our kind operators to resort to brutal process murder.

Closes-Bug: #1489209
Change-Id: I9d2886611f6db2498cd6a8f81a58f2a611f40905
2016-11-04 20:00:00 -07:00
Victor Stinner
e6776306b7 Python 3: fix usage of reload()
Replace reload() builtin function with six.moves.reload_module() to
make the code compatible with Python 2 and Python 3.

Change-Id: I7572d613fef700b392d412501facc3bd5ee72a66
2016-07-25 14:56:21 +02:00
janonymous
f5f9d791b0 pep8 fix: assertEquals -> assertEqual
assertEquals is deprecated in py3, replacing it.

Change-Id: Ida206abbb13c320095bb9e3b25a2b66cc31bfba8
Co-Authored-By: Ondřej Nový <ondrej.novy@firma.seznam.cz>
2015-10-11 12:57:25 +02:00
Jenkins
260e976e50 Merge "Get StringIO and cStringIO from six.moves" 2015-07-24 06:52:36 +00:00
janonymous
cd7b2db550 unit tests: Replace "self.assert_" by "self.assertTrue"
The assert_() method is deprecated and can be safely replaced by assertTrue().
This patch makes sure that running the tests does not create undesired
warnings.

Change-Id: I0602ba39ef93263386644ee68088d5f65fcb4a71
2015-07-21 19:23:00 +05:30
Victor Stinner
6e70f3fa32 Get StringIO and cStringIO from six.moves
* replace "from cStringIO import StringIO"
  with "from six.moves import cStringIO as StringIO"
* replace "from StringIO import StringIO"
  with "from six import StringIO"
* replace "import cStringIO" and "cStringIO.StringIO()"
  with "from six import moves" and "moves.cStringIO()"
* replace "import StringIO" and "StringIO.StringIO()"
  with "import six" and "six.StringIO()"

This patch was generated by the stringio operation of the sixer tool:
https://pypi.python.org/pypi/sixer

Change-Id: Iacba77fec3045f96773d1090c0bd48613729a561
2015-07-15 16:56:33 +02:00
Peter Portante
9411a24ba7 Revert "Refactor common/utils methods to common/ondisk"
This reverts commit 7760f41c3ce436cb23b4b8425db3749a3da33d32

Change-Id: I95e57a2563784a8cd5e995cc826afeac0eadbe62
Signed-off-by: Peter Portante <peter.portante@redhat.com>
2013-10-07 17:18:09 -04:00
ZhiQiang Fan
f72704fc82 Change OpenStack LLC to Foundation
Change-Id: I7c3df47c31759dbeb3105f8883e2688ada848d58
Closes-bug: #1214176
2013-09-20 01:02:31 +08:00
Peter Portante
7760f41c3c Refactor common/utils methods to common/ondisk
Place all the methods related to on-disk layout and / or configuration
into a new common module that can be shared by the various modules
using the same on-disk layout.

Change-Id: I27ffd4665d5115ffdde649c48a4d18e12017e6a9
Signed-off-by: Peter Portante <peter.portante@redhat.com>
2013-09-17 17:32:04 -04:00
Peter Portante
c067abd21e Pep8 unit test modules for hacking and one liners (4 of 12)
Address all the "hacking" lines that are flagged, and all the modules
that just have one item flagged.

Change-Id: I372a4bdf9c7748f73e38c4fd55e5954f1afade5b
Signed-off-by: Peter Portante <peter.portante@redhat.com>
2013-09-01 15:12:39 -04:00
Peter Portante
eb658a1034 Add unit tests to ensure TZ variable remains set
See review https://review.openstack.org/29836.

Change-Id: I8ec80202789707f723abfe93ccc9cf1e677e4dc6
Signed-off-by: Peter Portante <peter.portante@redhat.com>
2013-05-20 22:31:03 -04:00
David Hadas
a979c8007b Add support for Hash Prefix
A new configuration parameter is added to /etc/swift/swift.conf
[swift-hash]
swift_hash_path_prefix = 'random unique string'

New installations are advised to set this parameter to a random secret,
which would not be disclosed ouside the organization.
The same secret needs to be used by all swift servers of the same cluster.

Existing installations should set this parameter to an empty string
(the default)

DocImpact

Fixes: Bug #1157454

Change-Id: I63b10d0b7d6dd3f74e0f10bb41b5f240fa03578a
2013-03-22 19:41:55 +02:00
John Dickinson
1ecf5ebba1 updated copyright date for all files
Change-Id: Ifd909d3561c2647770a7e0caa3cd91acd1b4f298
2012-03-19 13:45:34 -05:00
gholt
cb58430321 logging: use routes to separate logging configurations 2011-02-02 13:39:08 -08:00
gholt
fdf20184e4 Fix duplicate logging 2011-02-02 09:38:17 -08:00
Anne Gentle
8823427161 Changed copyright notices on py files and the single rst file with a copyright notice 2011-01-04 17:34:43 -06:00
Michael Barton
d7dd3ec065 gettext updates 2010-12-20 21:47:50 +00:00
Clay Gerrard
c007d0296e removed unneeded daemonize function from utils, pulled get_socket out of run_wsgi, reworked test_utils and test_wsgi 2010-11-19 12:15:41 -06:00
Clay Gerrard
57a35f0d7c added helper/util to parse command line args; removed some duplicated code in
server/daemon bin scripts;  more standized python/linux daemonization
procedures; fixed lp:666957 "devauth server creates auth.db with the wrong
privileges"; new run_daemon helper based on run_wsgi simplifies daemon
launching/testing; new - all servers/daemons support verbose option when
started interactivlty which will log to the console; fixed lp:667839 "can't
start servers with relative paths to configs"; added tests
2010-11-11 16:41:07 -06:00
gholt
e6e354c483 Added some missing test stubs so we can better see coverage (and get a little syntax-level "testing"). 2010-10-07 08:23:17 -07:00