2497 Commits

Author SHA1 Message Date
Jenkins
f03b6d6aa7 Merge "Add cloud-user to non-root user list" 2014-05-28 20:43:30 +00:00
James E. Blair
e829bcb6b1 Don't accept tasks for stopped managers
In case a provider manager is being replaced due to a configuration
change, raise exceptions for any tasks submitted after it is stopped.

This will probably cause many errors, but they should already be
handled and servers will eventually be deleted.

We can minimize the disruption by making the provider managers more
adaptable to changes, but this stopgap measure should at least fix
the current problem we are observing with threads that get stuck
and never complete.

Change-Id: I3d190881ede30480d7c4ae970a0cb2dd07c3e160
2014-05-22 15:57:49 -07:00
Jenkins
18567ba04f Merge "Create new provider managers on image data changes" 2014-05-22 17:51:37 +00:00
Clark Boylan
941c62aabc Create new provider managers on image data changes
Previously only non image provider config would force nodepool to create
a new provider manager. There are cases when it is useful to  have
nodepool create a new manager when the image config changes. For
example, when you want to add an image or change the flavor of an image.

Add additional checks to see if the list of image names has changed and
if all the image names remain the same check if any of their properties
have changed. If any of these things have changed create a new provider
manager.

Change-Id: Ic0732453e12069d5d50a56dea0037817e978dad4
2014-05-22 10:33:45 -07:00
Jenkins
5ef2df9d21 Merge "Correct update-image to image-update in samples" 2014-05-22 17:23:21 +00:00
Brad P. Crochet
62deb91500 Add cloud-user to non-root user list
RHEL and CentOS cloud images use 'cloud-user' as the default user for
login. This adds support for trying that user if root does not work.

Change-Id: Ie9e4276800408dee5941971d49da35252f9a74ee
2014-05-21 13:05:15 -04:00
Jenkins
90f5596f52 Merge "Create snapshots when min-ready is >= 0" 2014-05-08 14:26:08 +00:00
Jenkins
0911a74e9d Merge "Fix typo in launch stats" 2014-05-05 17:36:32 +00:00
James E. Blair
34a620f1ae Fix typo in launch stats
This was preventing the new launch stats from reporting.

Change-Id: Iec7dd8cb2b90b77c6650f27182dae79ad05e541a
2014-05-02 15:12:14 -07:00
Jenkins
7747eb461c Merge "Improve logging/stats around launch errors" 2014-05-01 14:20:13 +00:00
Jenkins
4282802a26 Merge "Fix race in tests" 2014-05-01 14:18:41 +00:00
James E. Blair
c664e5e93b Improve logging/stats around launch errors
Break down launch errors into one of four categories.  Report
stats on those independently for providers, images, and targets.
Report elapsed time as well.  And do the same for successful
launches.

Also, log the provider and specific launch error in the case of
failure.

Change-Id: Ib33f0908add0d9b5160ee6b20cbccc30b56e6d57
2014-04-30 17:47:02 -07:00
James E. Blair
00671dd164 Fix race in tests
It was possible for a test to run before nodepool loaded the config.
Explicitly wait for that.

Change-Id: I9a358ac6534c43a000d2b3ec0210a63c57805731
2014-04-30 17:46:22 -07:00
Jeremy Stanley
7f05bbd0fd Correct update-image to image-update in samples
Change-Id: Ie68346e092a8d94852d51e02425df43befa5df08
2014-04-04 18:08:43 +00:00
Paul Belanger
ca12e64901 Create launch-timeout setting for providers
We now have the ability to override the default timeout (3600) when
launching a snapshot image.

Change-Id: I7696138cfc29ef876c3fef104b70098708990b2b
Signed-off-by: Paul Belanger <paul.belanger@polybeacon.com>
2014-04-03 22:22:33 -04:00
James E. Blair
7b3df8acd5 Fix neutron configuration error
AttributeError: 'Provider' object has no attribute 'nics'

Just make nics always defined on the provider object.  Code that
uses it is wrapped with 'if use_neutron' anyway.

Change-Id: I4de69d7aaa7e262076ec0d1dc96d1dd365f9efd4
2014-04-02 15:10:53 -07:00
Clark Boylan
d1711aab80 Immediately delete a floating IP if doesn't attach
When floating IPs fail to attach to nodes nodepool leaks that floating
IP. Correct this by immediately removing the floating IP as we have the
IPs id after creation but not during cleanupServer (because the floating
IP was never associated to any server).

Closes-Bug: #1301111
Change-Id: Ib6460797a8d9b31b1ad723e056dbfe6d57438bf7
2014-04-01 20:42:01 -07:00
Jenkins
418d1cee95 Merge "Detect neutron net-changes on reconfigure." 2014-04-02 00:35:34 +00:00
James E. Blair
5be35f1346 Protect against /etc/nodepool not existing
To transition from images without /etc/nodepool to images with,
check if /etc/nodepool exists before writing anything there.
Later, this change can be removed once we can assume it always
exists.

Change-Id: Ia39653cbbcea68cad618351bb61a39190a67e8b6
2014-04-01 11:34:26 -07:00
Clark Boylan
a93bb48cf5 Fix tox's insane pip install command.
Override tox's pip install command so that prereleases (alphas, betas,
broken things) aren't installed by default. Instead only install release
versions unless a prerelease version is specifically provided. Do this
by overriding the tox install command value.

This fixes doc builds that use `tox -evenv python setup.py
build_sphinx`.

Also, remove the gearman servers from the test configuration (since there
is not yet any support for it).

Change-Id: I31913420adcb48866d3996f2dd3b605c55acce2e
2014-04-01 11:27:58 -07:00
Robert Collins
303d62ff79 Detect neutron net-changes on reconfigure.
Change-Id: If142c10a4ada373e245e5f0d29026b0e943aaa1b
2014-04-01 11:21:09 +13:00
Paul Belanger
1b904ec697 Create snapshots when min-ready is >= 0
Currently, if min-ready is 0 a snapshot will not be created (nodepool
considers the image to be disabled). Now if min-ready is greater than or
equal 0, nodepool will create the snapshot.  The reason for the change is to
allow jenkins slaves to be offline waiting for a new job to be submitted. Once
a new job is submit, nodepool will properly launch a slave node from the
snapshot.

Additionally, min-ready is now optional and defaults to 2. If min-ready is -1
the snapshot will not be created (label becomes disabled).

Closes-Bug: #1299172

Change-Id: I7094a76b09266c00c0290d84ae0a39b6c2d16215
Signed-off-by: Paul Belanger <paul.belanger@polybeacon.com>
2014-03-31 15:23:58 -04:00
Jenkins
933a7e80dc Merge "Fix update-image command" 2014-03-31 18:06:57 +00:00
James E. Blair
4a60b2846e Fix update-image command
The updateImage method signature changed to require name strings
instead of objects.

Change-Id: Ic8aefe86e59d2e36db903fe3735b4907a6c2bf2a
2014-03-31 10:53:02 -07:00
James E. Blair
ddcf1543fe Fix missing attribute error in subnodes
If statsd was enabled, we would hit an undefined attribute error
because subnodes have no target.  Instead, pass through the target
name of the parent node and use that for statsd reporting.

Change-Id: Ic7a04a85775a23f954ea565e8c82976b52b218c7
2014-03-31 09:22:00 -07:00
James E. Blair
92b9842951 Add a test for subnodes
Some misc changes related to running this:
  * Set log/stdout/err capture vars as in ZUUL
  * Give the main loop a configurable sleep value so tests can run faster
  * Fix a confusing typo in the node.yaml config

Additionally, a better method for waiting for test completion is added
which permits us to use assert statements in the tests.

Change-Id: Icddd2afcd816dbd5ab955fa4ab5011ac8def8faf
2014-03-31 09:22:00 -07:00
James E. Blair
fca89ee0a0 Add a very basic functional test
It starts the daemon with a simple config file and ensures that
it spins up a node.

A timeout is added to the zmq listener so that its run loop can
be stopped by the 'stopped' flag.  And the shutdown procedure
for nodepool is altered so that it sets those flags and waits
for those threads to join before proceeding.  The previous method
could occasionally cause assertion errors (from C, therefore
core dumps) due to zmq concurrency issues.

Change-Id: I7019a80c9dbf0396c8ddc874a3f4f0c2e977dcfa
2014-03-31 09:22:00 -07:00
James E. Blair
852c6b0b96 Add per-test database fixture
And test it.

Change-Id: I49fb5f58127ed2a1c80282b55e30336da725b75c
2014-03-31 09:22:00 -07:00
James E. Blair
faef2431a7 Finish initial docs
Finish the initial sections defined in the documentation index.
Add sphinxcontrib-programoutput to document command line utils.
Add py27 to the list of default tox targets.

Change-Id: I254534032e0706e410647b023249fe3af4f3a35f
2014-03-31 09:21:56 -07:00
James E. Blair
22961c5ba5 Add tests for the allocator
And fix a bug in it that caused too-small allocations in some
circumstances.

The main demonstration of this failure was:
  nodepool.tests.test_allocator.TwoLabels.test_allocator(two_nodes)

Which allocated 1,2 instead of 2,2.  But the following tests also
failed:

  nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(four_nodes_over_quota)
  nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(four_nodes)
  nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(three_nodes)
  nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(four_nodes_at_quota)
  nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(one_node)

Change-Id: Idba0e52b2775132f52386785b3d5f0974c5e0f8e
2014-03-31 09:20:16 -07:00
James E. Blair
db5602a91e Add ready-script and multi-node support
Write information about the node group to /etc/nodepool, along
with an ssh key generated specifically for the node group.

Add an optional script that is run on each node (and sub-node) for
a label right before a node is placed in the ready state.  This
script can use the data in /etc/nodepool to setup access between
the nodes in the group.

Change-Id: Id0771c62095cccf383229780d1c4ddcf0ab42c1b
2014-03-31 09:20:15 -07:00
Jenkins
ad7b9a849b Merge "Fix the allocation distribution" 2014-03-28 23:22:34 +00:00
James E. Blair
da96ca9ff4 Fix the allocation distribution
The new simpler method for calculating the weight of the targets
is a little too simple and can miss allocating nodes.  Make the
weight change as the algorithm walks through the target list
to ensure that everything is allocated somewhere.

Change-Id: I98f72c69cf2793aa012f330219cd850a5d4ceab2
2014-03-28 15:51:59 -07:00
James E. Blair
7bfda82c6b Fix image/label name typo in stats
Change-Id: Ibd75b0ef169d6a3cc2a1ebfbea54139dfa28dedc
2014-03-28 15:12:11 -07:00
James E. Blair
9d4e56ff57 Add 'labels' as a configuration primitive
Labels replace images as the basic identity for nodes.  Rather than
having nodes of a particular image, we now have nodes of a particular
label.  A label describes how a node is created from an image, which
providers can supply such nodes, and how many should be kept ready.

This makes configuration simpler (by not specifying which images
are associated with which targets and simply assuming an even
distribution, the target section is _much_ smaller and _much_ less
repetitive).  It also facilitates describing how a nodes of
potentially different configurations (e.g., number of subnodes) can
be created from the same image.

Change-Id: I35b80d6420676d405439cbeca49f4b0a6f8d3822
2014-03-28 09:12:27 -07:00
Jenkins
9dd3ced2b1 Merge "Raise min_demand due to slow node boot times" 2014-03-28 15:53:18 +00:00
Jenkins
72f7333765 Merge "Stop waiting for resources in ERROR state" 2014-03-28 01:05:56 +00:00
James E. Blair
71e1419f61 Stop waiting for resources in ERROR state
As soon as a resource changes to the ERROR state, stop waiting for
it.  Return it to the caller, where it will be deleted.

Change-Id: I128bc4344b238b96e5696cce87f608fb2cdffa6e
2014-03-27 14:02:30 -07:00
James E. Blair
e206893f27 Add the ability to create subnodes
An image can specify that it should be created with a number of
subnodes.  That number of nodes of the same image type will also
be created and associated with each primary node of that image
type.

Adjust the allocator to accomodate the expected loss of capacity
associated with subnodes.

If a node has subnodes, wait until they are all in the ready state
before declaring a node ready.

Change-Id: Ia4b315b1ed2999da96aab60c5c02ea2ce7667494
2014-03-27 12:57:40 -07:00
James E. Blair
30bf0ecb87 Add SubNodes and the ability to delete them
There's no way to create subnodes yet.  But this change introduces
the class/table and a method to cleanly delete them if they exist.

The actual server deletion and the waiting for that to complete
are separated out in the provider manager, so that we can kick
off a bunch of subnode deletes and then wait for them all to
complete in one thread.

All existing calls to cleanupServer are augmented with a new
waitForServerDeletion call to handle the separation.

Change-Id: Iba9d5a0a61cccc07d914e60a24777c6451dca7ea
2014-03-27 12:57:40 -07:00
Jenkins
14b308b20b Merge "Include provider names in timeout messages" 2014-03-26 20:45:37 +00:00
Clark Boylan
841750fdea Depend on hacking for its dependencies.
flake8 does not pin its pep8 and pyflakes dependencies which makes it
possible for new sets of rules to suddenly apply whenever either of
those projects pushes a new release. Fix this by depending on hacking
instead of flake8, pep8, and pyflakes directly. This will keep nodepool
in sync with the rest of openstack even if it doesn't use the hacking
rules (H*).

Change-Id: Ice9198e9439ebcac15e76832835e78f72344425c
2014-03-26 12:39:41 -07:00
James E. Blair
fdc3616927 Include provider names in timeout messages
Also, remove the extra "waiting for".

Change-Id: I5842daecb6b193eb5d6d2d2662dfab89ac8f7344
2014-03-25 17:56:29 -07:00
Paul Belanger
e2fff8cd15 Set paramiko version > 1.9.0
Ubuntu 12.04 package version for paramiko is 1.7.7.1, which lacks
the additional arguments for exec_command:

  TypeError: exec_command() got an unexpected keyword argument 'get_pty'

1.10.0 was the first version to add get_pty flag.

Change-Id: I3b4d8a6d8a1d10ab002a79824feab8937d160244
Signed-off-by: Paul Belanger <paul.belanger@polybeacon.com>
2014-03-23 19:36:41 -04:00
James E. Blair
6ab771728e Delete created keypairs if nova boot fails
If the server fails to boot (perhaps due to quota issues) during
an image update, if nodepool created a key for that server it won't
be deleted because the existing keypair delete is done as part of
deleting the server.  Handle the special case of never having
actually booted a server and delete the keypair explicitly in that
case.

Change-Id: I0607b77ef2d52cbb8a81feb5e9c502b080a51dbe
2014-03-20 10:24:33 -07:00
Joe Gordon
4cae69fd01 Raise min_demand due to slow node boot times
Booting a node takes up to 16 minutes, so keep more nodes in ready
state.

Change-Id: I0ae647c658feffabc499c96b0a9ed11855202c4b
2014-03-17 14:28:04 -07:00
Jenkins
bd6f5cdf54 Merge "Roll up node stats" 2014-03-14 21:59:03 +00:00
Fengqian Gao
366746aeb0 Keep py3.X compatibility for urllib/urllib2
Use six.moves.urllib instead of urllib and
six.moves.urllib.request instead of urllib2.

Partial-Bug: #1280105

Change-Id: Id122f7be5aa3e0dd213bfa86f9be86d10d72b4a6
2014-02-25 16:51:03 +08:00
James E. Blair
7e0c42d035 Roll up node stats
As the number of providers and targets grows, the number of stats
that graphite has to sum in order to produce the summary graphs that
we use grows.  Instead of asking graphite to summarize something like
400 metrics (takes about 3 seconds), have nodepool directly produce
the metrics that we are going to use.

Change-Id: I2a7403af2512ace0cbe795f2ec17ebcd9b90dd09
2014-02-24 17:29:19 -08:00
Jenkins
839646ecbe Merge "Add fedora support" 2014-02-24 21:49:37 +00:00