Create images locally using diskimage-builder, and then upload
to glance.
Co-Authored-By: Monty Taylor <mordred@inaugust.com>
Change-Id: I8e96e9ea5b74ca640f483c9e1ad04a584b5660ed
Previously, when cleanupServer() was called it would first query nova
to get the server and then release the floatingip associated with it
(if there was one) and then delete the floatingip. The potential floating
ip leak would occur if we called removeFloatingIP and the call to
deleteFloatingIP failed. The leak would occur the next time we tried to
clean up the server as we no longer find the floatingip to delete as
it had been disassocated from the server. To fix this issue we just need
to drop the call to removeFloating and call deleteFloatingIP as we don't
really need to disassociate the floatingip first.
Unfortunately, there is a potential bug in nova-api when using neutron that
this call can disassociate the floatingip and then fail to delete it. In this
case we'll end up leaking the floatingip in the same way. The fix for this
nova issue is here: I53b0c9d949404288e8687b304361a74b53d69ef3
closes-bug: 1356161
Change-Id: I0c78823198fac0d31235d93505a4251edbf9e612
When a command run through ssh fails, nodepool raises an exception
saying that it failed, but doesn't include any other information.
This patch adds in the output from stderr and stdout if output=True,
as well as the exact command run to help debug problems.
Also, added output parameter to the fake ssh client for testing.
Change-Id: If55643aef91c90b7c27fe4f532d51d9ef72b1ab4
When constructing launchStats() subkeys for graphite send the existing
key data but also add a key for provider.az info if we have an az.
This rolls up the az data into non az prefixed key but also provides az
specific information. This will track launch stats for individual AZs.
Change-Id: Ie67238950e9bd927f942f21fadb7f3e894de118d
This reverts commit 9f553c9a9752071129f6e8a31535829c5e9a0d91.
We have observed a problem with this patch where nodepool may
attempt to allocate more than the configured quota.
Steps to reproduce:
* Nodepool has a very high (>2x quota) request load
* Restart nodepool
* First pass through allocator appears normal and allocates up to quota
* Second pass allocates an additional $quota worth of nodes, all from
the last-defined provider
* Repeats until request load is satisfied
This caused us to request thousands of nodes from our providers at
once.
Change-Id: I08e5fd2de668cc2fc2d68bb1bf09f2d725f82c7f
voluptuous is a library that can be used to validate a YAML schema. It
is unused and if it ever will, should be bumped to 0.7+ anyway.
Change-Id: If26d7326714206f3736aaea0e2d6ecc57f995692
Nodepool currently writes to the DEBUG logging level with no way to view
the logging at that level. This creates a --debug argument to show those
logging events.
Change-Id: I6d4541a587b5ecd654287fda6da0e138998c200a
This patch adds the ability to specify a net-label
instead of using a net-id (network UUID).
Rather than use network UUID's in our nodepool.yaml
config files it would be nice to use the more meaningful
network labels instead. This should make the config file
more readable across the various cloud providers and
matches how we use image names (instead of image
UUIDs) as well.
The current implementation relies on the os-tenant-networks
extension in Nova to provide the network label lookup.
Given that nodepool is currently focused around using
novaclient this made the most sense. We may at some point
in the future want to use the Neutron API directly for
this information or perhaps use a combination of both
approaches to accommodate a variety of provider API
deployment choices.
Tested locally on my TripleO overcloud using two
Neutron networks.
Change-Id: I9bdd35adf2d85659cf1b992ccd2fcf98fb124528
Track the last round of allocations to ensure that we don't starve a
particular label. If a label made a request but didn't get any nodes
the last time round, it is given nodes preferentially on the next
calculation.
This results in a round-robin allocation during heavy contention.
Note that in the much more usual no-contention case, when everyone is
getting some of their allocations, this makes no change of over the
status quo.
AllocationHistory() is added as a new object to track the request and
grant of allocations. It is instantiated an passed along with an
AllocationRequeset(). When all requests are finished, grantsDone() is
called on the object to store the history for that round.
By keeping the AllocationHistory object, one could imagine much
fancier algorithms where preference is given proportionally based on
how many prior allocations have failed, etc. This is intended to
provide infrastructure for such a change, but mostly to be a simple
first-pass at the problem with minimal changes to the status quo.
Change-Id: I0ff9aa74fef807bd84bf51e7ba1ed176c22f5365
Closes-Bug: #1308407
The configuration link on the installation page was trying to
link to a local #configuration, we instead want it to reference
the configuration.html page.
Change-Id: Ie1d3bb7300902185e07aef91e58e403ca67981ba
Instead of assuming nodes are for openstack.org make the hostnames
of all nodes and templates configurable on a provider level.
Change-Id: I5d5650fe6b22ecb25b994767e48e7742d7238a18
It can get a bit confusing as to what test is looking for what result.
Add a message to the failure case to clear things up. Looks like:
---
raise mismatch_error
MismatchError: 1 != 2: Error at pos 1, expected [1, 2, 3, 1] and got [1, 1, 1, 1]
======================================================================
---
TrivialFix
Change-Id: I40b57394b270f419b032301f490d4ba791c66396
Without this, mysql can match the anonymous user first and then
rejects the openstack_citest password [1] leading to some confusing
tox output.
[1] http://bugs.mysql.com/bug.php?id=36576
TrivialFix
Change-Id: Ic9a753307960634f0e5c40abf06ec5bac92d9897
If a task manager was stopped with tasks in queue, they would
not be processed. This could result in deadlocked threads if
one of the long-running threads was waiting on a task while
the main thread shut down a manager.
Change-Id: I3f58c599d472d134984e63b41e9d493be9e9d70b
The PID lock file does not actually test whether the lock is valid during
acquisition. This leads to a failure to acquire the lock (essentially just
a file-based lock) when starting the process after a failure / kill.
Change-Id: Iebf0e077377278eb28b3280a8abfc605ac68a759
System installed library of libzmq-dev will have libzmq1,
which doesn't support RCVTIMEO in some linux distros, eg.
ubuntu 12.04 and pip will install libzmq and compile the
supporting version anyway so remove libzmq-dev from dependency
installation by apt-get in README.rst.
Change-Id: Ifd67c7ded5db7dbb82624d2aa08843278f2de72c
The nodepool tests depend on watermark sleep being long enough that all
of the fake nodes are marked ready before the next deficit check. If it
is not long enough then the tests may boot extra servers causing asserts
in tests to fail.
Increase watermark sleep to one second from half a second and let each
test sleep for three seconds before checking node states.
Change-Id: Ia94527b46bad26b184af8fa02b3a1e2a1f7a3430
With the incorrect compile-time options for libzmq1, pyzmq
will give an error such as the following:
AttributeError: Socket has no such option: RCVTIMEO
Change-Id: I719a8de89b26dba974d7af8d631b7cdd729a074b
If a bunch of threads waiting for servers all decided that the
server list was expired at around the same time, they could all
end up submitting server list tasks which defeats the caching.
If listing all servers was slow, that would only exacerbate the
situation.
Instead, wrap the actual list server API call inside of a
non-blocking lock to make sure that it only happens once per
period.
Change-Id: I2a09ab3a226001d9de4268d366f65ef3e69cdd0d
We did not check the status of a completed image build; if it
went into the ERROR state, we assumed it worked. Check the return
value.
Change-Id: I2acf8ac4c5641aa69932230d3414e92620f6e735
Add the availability zone data for a node to the `nodepool list` output.
This should aid in debugging of AZs.
Change-Id: If861e666c5d9eec4f4f1ddf1bb431fb06436b6e6
Apparently not all clouds will schedule AZs as expected by nova. It may
be the case that an AZ is hard set on the provider side when no specific
AZ is requested. Add AZ support to nodepool so that it can request AZs
and better load balance across AZs provided.
Do this by randomly selecting an AZ from the list of AZs provided in the
config. This should give us a good distribution across all AZs. If a
differently weighted distribution is required a nodepool provider object
can be created per AZ with a single item AZ list.
Note this requires an update to the nodepool database.
Change-Id: I428336ad817a8eb7d311a68767849aab0bcf015f
According to https://docs.python.org/3/howto/pyporting.html the
syntax changed in Python 3.x. The new syntax is usable with
Python >= 2.6 and should be preferred to be compatible with Python3.
Enabled hacking check H231.
Change-Id: Ide60f971493440311f1dcc594e33d536beb925e5