2497 Commits

Author SHA1 Message Date
Jenkins
f43330c6c0 Merge "Fix potential floating ip leakage issue" 2014-08-14 16:24:30 +00:00
Jenkins
aa29c6729f Merge "Add timestamps to nodepool logging" 2014-08-14 16:20:23 +00:00
Yolanda Robla
beab513bc9 Build images using diskimage-builder
Create images locally using diskimage-builder, and then upload
to glance.

Co-Authored-By: Monty Taylor <mordred@inaugust.com>

Change-Id: I8e96e9ea5b74ca640f483c9e1ad04a584b5660ed
2014-08-14 11:31:57 +02:00
Jonathan Harker
b8da6b7bff Add timestamps to nodepool logging
Change-Id: If3a8abcc3c35616dacacf7d9532cd1862b3c32a5
2014-08-13 11:40:04 -07:00
Aaron Rosen
f0068be114 Fix potential floating ip leakage issue
Previously, when cleanupServer() was called it would first query nova
to get the server and then release the floatingip associated with it
(if there was one) and then delete the floatingip. The potential floating
ip leak would occur if we called removeFloatingIP and the call to
deleteFloatingIP failed. The leak would occur the next time we tried to
clean up the server as we no longer find the floatingip to delete as
it had been disassocated from the server. To fix this issue we just need
to drop the call to removeFloating and call deleteFloatingIP as we don't
really need to disassociate the floatingip first.

Unfortunately, there is a potential bug in nova-api when using neutron that
this call can disassociate the floatingip and then fail to delete it. In this
case we'll end up leaking the floatingip in the same way. The fix for this
nova issue is here: I53b0c9d949404288e8687b304361a74b53d69ef3

closes-bug: 1356161

Change-Id: I0c78823198fac0d31235d93505a4251edbf9e612
2014-08-12 19:26:09 -07:00
Mike Heald
bc40b65b47 Add stdout and stderr to exception when ready script fails
When a command run through ssh fails, nodepool raises an exception
saying that it failed, but doesn't include any other information.
This patch adds in the output from stderr and stdout if output=True,
as well as the exact command run to help debug problems.

Also, added output parameter to the fake ssh client for testing.

Change-Id: If55643aef91c90b7c27fe4f532d51d9ef72b1ab4
2014-08-12 12:18:30 +01:00
Jenkins
cdb1e87c23 Merge "Record provider AZ info in graphite." 2014-08-05 19:26:14 +00:00
Jenkins
f69d62bb62 Merge "Add support for network labels" 2014-08-01 23:53:40 +00:00
Clark Boylan
35192322c6 Record provider AZ info in graphite.
When constructing launchStats() subkeys for graphite send the existing
key data but also add a key for provider.az info if we have an az.
This rolls up the az data into non az prefixed key but also provides az
specific information. This will track launch stats for individual AZs.

Change-Id: Ie67238950e9bd927f942f21fadb7f3e894de118d
2014-08-01 16:51:45 -07:00
Jenkins
8e7c5a1620 Merge "Drop voluptuous from requirements" 2014-08-01 23:51:20 +00:00
Jenkins
6f300e739c Merge "Use correct provider in test-case" 2014-07-29 12:08:27 +00:00
Jenkins
0e0273fc60 Merge "Cleaning up index.rst file" 2014-07-28 13:09:50 +00:00
Ian Wienand
9c26406d95 Use correct provider in test-case
A typo here had the second provider using the available value for the
from the first provider.

Change-Id: Iba85aeba6beaa80f8a02eff3f3fe6ebd394d36b0
2014-07-28 11:15:39 +10:00
James E. Blair
fb27c84bbf Revert "Track last allocations to ensure forward-progress"
This reverts commit 9f553c9a9752071129f6e8a31535829c5e9a0d91.

We have observed a problem with this patch where nodepool may
attempt to allocate more than the configured quota.

Steps to reproduce:

* Nodepool has a very high (>2x quota) request load
* Restart nodepool
* First pass through allocator appears normal and allocates up to quota
* Second pass allocates an additional $quota worth of nodes, all from
  the last-defined provider
* Repeats until request load is satisfied

This caused us to request thousands of nodes from our providers at
once.

Change-Id: I08e5fd2de668cc2fc2d68bb1bf09f2d725f82c7f
2014-07-23 14:57:09 -07:00
Christian Berendt
45163917ed Cleaning up index.rst file
Removed notes about the generation of the file.

Change-Id: I5d46f4b52dba3b07c360bcaf872ed2c6b0579555
2014-07-21 08:26:39 +02:00
Jenkins
dc518014a0 Merge "Enable debugging output" 2014-07-14 12:51:15 +00:00
Jenkins
46f8c4537a Merge "Fix Configuration link in docs" 2014-07-14 12:49:45 +00:00
Antoine Musso
84d2c7a156 Drop voluptuous from requirements
voluptuous is a library that can be used to validate a YAML schema. It
is unused and if it ever will, should be bumped to 0.7+ anyway.

Change-Id: If26d7326714206f3736aaea0e2d6ecc57f995692
2014-07-10 12:02:37 +02:00
K Jonathan Harker
4b6b74a17f Enable debugging output
Nodepool currently writes to the DEBUG logging level with no way to view
the logging at that level. This creates a --debug argument to show those
logging events.

Change-Id: I6d4541a587b5ecd654287fda6da0e138998c200a
2014-07-03 15:29:25 -07:00
Dan Prince
e471cea178 Add support for network labels
This patch adds the ability to specify a net-label
instead of using a net-id (network UUID).

Rather than use network UUID's in our nodepool.yaml
config files it would be nice to use the more meaningful
network labels instead. This should make the config file
more readable across the various cloud providers and
matches how we use image names (instead of image
UUIDs) as well.

The current implementation relies on the os-tenant-networks
extension in Nova to provide the network label lookup.
Given that nodepool is currently focused around using
novaclient this made the most sense. We may at some point
in the future want to use the Neutron API directly for
this information or perhaps use a combination of both
approaches to accommodate a variety of provider API
deployment choices.

Tested locally on my TripleO overcloud using two
Neutron networks.

Change-Id: I9bdd35adf2d85659cf1b992ccd2fcf98fb124528
2014-07-03 15:32:57 -04:00
Jenkins
fc335e41be Merge "Track last allocations to ensure forward-progress" 2014-07-02 22:41:18 +00:00
Jenkins
01a7270ad9 Merge "Update pbr version" 2014-07-02 07:39:47 +00:00
Ian Wienand
9f553c9a97 Track last allocations to ensure forward-progress
Track the last round of allocations to ensure that we don't starve a
particular label.  If a label made a request but didn't get any nodes
the last time round, it is given nodes preferentially on the next
calculation.

This results in a round-robin allocation during heavy contention.
Note that in the much more usual no-contention case, when everyone is
getting some of their allocations, this makes no change of over the
status quo.

AllocationHistory() is added as a new object to track the request and
grant of allocations.  It is instantiated an passed along with an
AllocationRequeset().  When all requests are finished, grantsDone() is
called on the object to store the history for that round.

By keeping the AllocationHistory object, one could imagine much
fancier algorithms where preference is given proportionally based on
how many prior allocations have failed, etc.  This is intended to
provide infrastructure for such a change, but mostly to be a simple
first-pass at the problem with minimal changes to the status quo.

Change-Id: I0ff9aa74fef807bd84bf51e7ba1ed176c22f5365
Closes-Bug: #1308407
2014-07-01 19:14:56 +00:00
Elizabeth K. Joseph
9c4f91a9f3 Fix Configuration link in docs
The configuration link on the installation page was trying to
link to a local #configuration, we instead want it to reference
the configuration.html page.

Change-Id: Ie1d3bb7300902185e07aef91e58e403ca67981ba
2014-07-01 11:44:20 -07:00
Joshua Hesketh
f41385d146 Make template and node hostnames configurable
Instead of assuming nodes are for openstack.org make the hostnames
of all nodes and templates configurable on a provider level.

Change-Id: I5d5650fe6b22ecb25b994767e48e7742d7238a18
2014-06-30 18:03:46 +10:00
Longgeek
7a2721e50c Update pbr version
Change-Id: I1aa8a8ceb39e88aeb5989b33484237fcb778480f
2014-06-29 00:20:36 +08:00
Jenkins
93618a2f70 Merge "Handle task manager shutdown more correctly" 2014-06-26 19:14:32 +00:00
Jenkins
653955efda Merge "Show expected output in test-case error" 2014-06-22 16:02:27 +00:00
Jenkins
dc12f678ad Merge "Add @localhost to openstack_citest user example" 2014-06-22 16:00:43 +00:00
Jenkins
2430fe0b1a Merge "Pass in hostname as a script parameter" 2014-06-22 15:59:34 +00:00
Ian Wienand
1d5397ee8a Show expected output in test-case error
It can get a bit confusing as to what test is looking for what result.
Add a message to the failure case to clear things up.  Looks like:

---
 raise mismatch_error
MismatchError: 1 != 2: Error at pos 1, expected [1, 2, 3, 1] and got [1, 1, 1, 1]
======================================================================
---

TrivialFix

Change-Id: I40b57394b270f419b032301f490d4ba791c66396
2014-06-19 11:23:27 +10:00
Ian Wienand
b87d94df9d Add @localhost to openstack_citest user example
Without this, mysql can match the anonymous user first and then
rejects the openstack_citest password [1] leading to some confusing
tox output.

[1] http://bugs.mysql.com/bug.php?id=36576

TrivialFix
Change-Id: Ic9a753307960634f0e5c40abf06ec5bac92d9897
2014-06-18 08:41:24 +10:00
James E. Blair
8ccc227b2d Handle task manager shutdown more correctly
If a task manager was stopped with tasks in queue, they would
not be processed.  This could result in deadlocked threads if
one of the long-running threads was waiting on a task while
the main thread shut down a manager.

Change-Id: I3f58c599d472d134984e63b41e9d493be9e9d70b
2014-06-17 08:26:33 -07:00
Jenkins
0dc4b59b7d Merge "Check for stale PID lock when starting" 2014-06-14 18:23:55 +00:00
Christian Berendt
8bbfde2199 Use import from six.moves to import the queue module
The name of the synchronized queue class is queue instead of
Queue in Python3.

Change-Id: I508268561f95c9fed2d39fb45731aab5d9d74111
2014-06-07 21:07:02 +02:00
K Jonathan Harker
8e254b4994 Pass in hostname as a script parameter
Pass the hostname in as the first parameter to both the setup script and
the ready script.

Change-Id: I0de51156b56ae750dd519da0da68b85ac5d41267
2014-06-06 16:23:19 -07:00
Bob Ball
02a45bf2d9 Check for stale PID lock when starting
The PID lock file does not actually test whether the lock is valid during
acquisition.  This leads to a failure to acquire the lock (essentially just
a file-based lock) when starting the process after a failure / kill.

Change-Id: Iebf0e077377278eb28b3280a8abfc605ac68a759
2014-06-06 09:44:33 +01:00
Jenkins
174a5d7f6d Merge "Add warnings about the installation of libzmq1" 2014-06-06 07:30:47 +00:00
Jenkins
13bcfa5664 Merge "Remove libzmq-dev from dependency list" 2014-06-06 07:22:37 +00:00
Xinyu Zhao
bd0a5cce05 Remove libzmq-dev from dependency list
System installed library of libzmq-dev will have libzmq1,
which doesn't support RCVTIMEO in some linux distros, eg.
ubuntu 12.04 and pip will install libzmq and compile the
supporting version anyway so remove libzmq-dev from dependency
installation by apt-get in README.rst.

Change-Id: Ifd67c7ded5db7dbb82624d2aa08843278f2de72c
2014-06-05 19:04:56 +00:00
Clark Boylan
37034efc30 Increase watermark sleep in tests for reliability.
The nodepool tests depend on watermark sleep being long enough that all
of the fake nodes are marked ready before the next deficit check. If it
is not long enough then the tests may boot extra servers causing asserts
in tests to fail.

Increase watermark sleep to one second from half a second and let each
test sleep for three seconds before checking node states.

Change-Id: Ia94527b46bad26b184af8fa02b3a1e2a1f7a3430
2014-06-05 10:25:03 -07:00
Bob Ball
8b0ba7bcf2 Add warnings about the installation of libzmq1
With the incorrect compile-time options for libzmq1, pyzmq
will give an error such as the following:

    AttributeError: Socket has no such option: RCVTIMEO

Change-Id: I719a8de89b26dba974d7af8d631b7cdd729a074b
2014-06-04 15:50:47 +00:00
James E. Blair
b6539f9cdd Log task durations
Change-Id: I87c6f870ccb806d3484b38ac123ac89201a854cd
2014-06-03 14:31:15 -07:00
James E. Blair
734435b772 Log task manager queue length
A long queue isn't necessarily bad, but more information could be
helpful.

Change-Id: I34d6bb5c1627af1fbc700458d8950add296bc1bc
2014-06-03 14:31:15 -07:00
James E. Blair
bbd0c0ba7d Prevent listserver tasks from piling up
If a bunch of threads waiting for servers all decided that the
server list was expired at around the same time, they could all
end up submitting server list tasks which defeats the caching.

If listing all servers was slow, that would only exacerbate the
situation.

Instead, wrap the actual list server API call inside of a
non-blocking lock to make sure that it only happens once per
period.

Change-Id: I2a09ab3a226001d9de4268d366f65ef3e69cdd0d
2014-06-03 14:31:15 -07:00
James E. Blair
1a0b20ef2c Check the returned image status
We did not check the status of a completed image build; if it
went into the ERROR state, we assumed it worked.  Check the return
value.

Change-Id: I2acf8ac4c5641aa69932230d3414e92620f6e735
2014-06-03 14:31:15 -07:00
Jenkins
1b3b85e76d Merge "Use except x as y instead of except x, y" 2014-06-03 09:20:45 +00:00
Clark Boylan
7a90ee061d Display node AZ in nodepool list output
Add the availability zone data for a node to the `nodepool list` output.
This should aid in debugging of AZs.

Change-Id: If861e666c5d9eec4f4f1ddf1bb431fb06436b6e6
2014-06-02 14:40:23 -07:00
Clark Boylan
a297ee63ec Support provider AZ lists
Apparently not all clouds will schedule AZs as expected by nova. It may
be the case that an AZ is hard set on the provider side when no specific
AZ is requested. Add AZ support to nodepool so that it can request AZs
and better load balance across AZs provided.

Do this by randomly selecting an AZ from the list of AZs provided in the
config. This should give us a good distribution across all AZs. If a
differently weighted distribution is required a nodepool provider object
can be created per AZ with a single item AZ list.

Note this requires an update to the nodepool database.

Change-Id: I428336ad817a8eb7d311a68767849aab0bcf015f
2014-06-02 12:06:42 -07:00
Christian Berendt
e3dd94d65c Use except x as y instead of except x, y
According to https://docs.python.org/3/howto/pyporting.html the
syntax changed in Python 3.x. The new syntax is usable with
Python >= 2.6 and should be preferred to be compatible with Python3.

Enabled hacking check H231.

Change-Id: Ide60f971493440311f1dcc594e33d536beb925e5
2014-05-29 23:57:48 +02:00