In case a provider manager is being replaced due to a configuration
change, raise exceptions for any tasks submitted after it is stopped.
This will probably cause many errors, but they should already be
handled and servers will eventually be deleted.
We can minimize the disruption by making the provider managers more
adaptable to changes, but this stopgap measure should at least fix
the current problem we are observing with threads that get stuck
and never complete.
Change-Id: I3d190881ede30480d7c4ae970a0cb2dd07c3e160
Previously only non image provider config would force nodepool to create
a new provider manager. There are cases when it is useful to have
nodepool create a new manager when the image config changes. For
example, when you want to add an image or change the flavor of an image.
Add additional checks to see if the list of image names has changed and
if all the image names remain the same check if any of their properties
have changed. If any of these things have changed create a new provider
manager.
Change-Id: Ic0732453e12069d5d50a56dea0037817e978dad4
RHEL and CentOS cloud images use 'cloud-user' as the default user for
login. This adds support for trying that user if root does not work.
Change-Id: Ie9e4276800408dee5941971d49da35252f9a74ee
Break down launch errors into one of four categories. Report
stats on those independently for providers, images, and targets.
Report elapsed time as well. And do the same for successful
launches.
Also, log the provider and specific launch error in the case of
failure.
Change-Id: Ib33f0908add0d9b5160ee6b20cbccc30b56e6d57
We now have the ability to override the default timeout (3600) when
launching a snapshot image.
Change-Id: I7696138cfc29ef876c3fef104b70098708990b2b
Signed-off-by: Paul Belanger <paul.belanger@polybeacon.com>
AttributeError: 'Provider' object has no attribute 'nics'
Just make nics always defined on the provider object. Code that
uses it is wrapped with 'if use_neutron' anyway.
Change-Id: I4de69d7aaa7e262076ec0d1dc96d1dd365f9efd4
When floating IPs fail to attach to nodes nodepool leaks that floating
IP. Correct this by immediately removing the floating IP as we have the
IPs id after creation but not during cleanupServer (because the floating
IP was never associated to any server).
Closes-Bug: #1301111
Change-Id: Ib6460797a8d9b31b1ad723e056dbfe6d57438bf7
To transition from images without /etc/nodepool to images with,
check if /etc/nodepool exists before writing anything there.
Later, this change can be removed once we can assume it always
exists.
Change-Id: Ia39653cbbcea68cad618351bb61a39190a67e8b6
Override tox's pip install command so that prereleases (alphas, betas,
broken things) aren't installed by default. Instead only install release
versions unless a prerelease version is specifically provided. Do this
by overriding the tox install command value.
This fixes doc builds that use `tox -evenv python setup.py
build_sphinx`.
Also, remove the gearman servers from the test configuration (since there
is not yet any support for it).
Change-Id: I31913420adcb48866d3996f2dd3b605c55acce2e
Currently, if min-ready is 0 a snapshot will not be created (nodepool
considers the image to be disabled). Now if min-ready is greater than or
equal 0, nodepool will create the snapshot. The reason for the change is to
allow jenkins slaves to be offline waiting for a new job to be submitted. Once
a new job is submit, nodepool will properly launch a slave node from the
snapshot.
Additionally, min-ready is now optional and defaults to 2. If min-ready is -1
the snapshot will not be created (label becomes disabled).
Closes-Bug: #1299172
Change-Id: I7094a76b09266c00c0290d84ae0a39b6c2d16215
Signed-off-by: Paul Belanger <paul.belanger@polybeacon.com>
If statsd was enabled, we would hit an undefined attribute error
because subnodes have no target. Instead, pass through the target
name of the parent node and use that for statsd reporting.
Change-Id: Ic7a04a85775a23f954ea565e8c82976b52b218c7
Some misc changes related to running this:
* Set log/stdout/err capture vars as in ZUUL
* Give the main loop a configurable sleep value so tests can run faster
* Fix a confusing typo in the node.yaml config
Additionally, a better method for waiting for test completion is added
which permits us to use assert statements in the tests.
Change-Id: Icddd2afcd816dbd5ab955fa4ab5011ac8def8faf
It starts the daemon with a simple config file and ensures that
it spins up a node.
A timeout is added to the zmq listener so that its run loop can
be stopped by the 'stopped' flag. And the shutdown procedure
for nodepool is altered so that it sets those flags and waits
for those threads to join before proceeding. The previous method
could occasionally cause assertion errors (from C, therefore
core dumps) due to zmq concurrency issues.
Change-Id: I7019a80c9dbf0396c8ddc874a3f4f0c2e977dcfa
Finish the initial sections defined in the documentation index.
Add sphinxcontrib-programoutput to document command line utils.
Add py27 to the list of default tox targets.
Change-Id: I254534032e0706e410647b023249fe3af4f3a35f
And fix a bug in it that caused too-small allocations in some
circumstances.
The main demonstration of this failure was:
nodepool.tests.test_allocator.TwoLabels.test_allocator(two_nodes)
Which allocated 1,2 instead of 2,2. But the following tests also
failed:
nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(four_nodes_over_quota)
nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(four_nodes)
nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(three_nodes)
nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(four_nodes_at_quota)
nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(one_node)
Change-Id: Idba0e52b2775132f52386785b3d5f0974c5e0f8e
Write information about the node group to /etc/nodepool, along
with an ssh key generated specifically for the node group.
Add an optional script that is run on each node (and sub-node) for
a label right before a node is placed in the ready state. This
script can use the data in /etc/nodepool to setup access between
the nodes in the group.
Change-Id: Id0771c62095cccf383229780d1c4ddcf0ab42c1b
The new simpler method for calculating the weight of the targets
is a little too simple and can miss allocating nodes. Make the
weight change as the algorithm walks through the target list
to ensure that everything is allocated somewhere.
Change-Id: I98f72c69cf2793aa012f330219cd850a5d4ceab2
Labels replace images as the basic identity for nodes. Rather than
having nodes of a particular image, we now have nodes of a particular
label. A label describes how a node is created from an image, which
providers can supply such nodes, and how many should be kept ready.
This makes configuration simpler (by not specifying which images
are associated with which targets and simply assuming an even
distribution, the target section is _much_ smaller and _much_ less
repetitive). It also facilitates describing how a nodes of
potentially different configurations (e.g., number of subnodes) can
be created from the same image.
Change-Id: I35b80d6420676d405439cbeca49f4b0a6f8d3822
As soon as a resource changes to the ERROR state, stop waiting for
it. Return it to the caller, where it will be deleted.
Change-Id: I128bc4344b238b96e5696cce87f608fb2cdffa6e
An image can specify that it should be created with a number of
subnodes. That number of nodes of the same image type will also
be created and associated with each primary node of that image
type.
Adjust the allocator to accomodate the expected loss of capacity
associated with subnodes.
If a node has subnodes, wait until they are all in the ready state
before declaring a node ready.
Change-Id: Ia4b315b1ed2999da96aab60c5c02ea2ce7667494
There's no way to create subnodes yet. But this change introduces
the class/table and a method to cleanly delete them if they exist.
The actual server deletion and the waiting for that to complete
are separated out in the provider manager, so that we can kick
off a bunch of subnode deletes and then wait for them all to
complete in one thread.
All existing calls to cleanupServer are augmented with a new
waitForServerDeletion call to handle the separation.
Change-Id: Iba9d5a0a61cccc07d914e60a24777c6451dca7ea
flake8 does not pin its pep8 and pyflakes dependencies which makes it
possible for new sets of rules to suddenly apply whenever either of
those projects pushes a new release. Fix this by depending on hacking
instead of flake8, pep8, and pyflakes directly. This will keep nodepool
in sync with the rest of openstack even if it doesn't use the hacking
rules (H*).
Change-Id: Ice9198e9439ebcac15e76832835e78f72344425c
Ubuntu 12.04 package version for paramiko is 1.7.7.1, which lacks
the additional arguments for exec_command:
TypeError: exec_command() got an unexpected keyword argument 'get_pty'
1.10.0 was the first version to add get_pty flag.
Change-Id: I3b4d8a6d8a1d10ab002a79824feab8937d160244
Signed-off-by: Paul Belanger <paul.belanger@polybeacon.com>
If the server fails to boot (perhaps due to quota issues) during
an image update, if nodepool created a key for that server it won't
be deleted because the existing keypair delete is done as part of
deleting the server. Handle the special case of never having
actually booted a server and delete the keypair explicitly in that
case.
Change-Id: I0607b77ef2d52cbb8a81feb5e9c502b080a51dbe
Use six.moves.urllib instead of urllib and
six.moves.urllib.request instead of urllib2.
Partial-Bug: #1280105
Change-Id: Id122f7be5aa3e0dd213bfa86f9be86d10d72b4a6
As the number of providers and targets grows, the number of stats
that graphite has to sum in order to produce the summary graphs that
we use grows. Instead of asking graphite to summarize something like
400 metrics (takes about 3 seconds), have nodepool directly produce
the metrics that we are going to use.
Change-Id: I2a7403af2512ace0cbe795f2ec17ebcd9b90dd09