2497 Commits

Author SHA1 Message Date
Jenkins
e88bff13a6 Merge "Pass correct server_id when adding public IP" 2013-10-10 23:03:58 +00:00
Paul Belanger
521a690414 Pass correct server_id when adding public IP
Change-Id: I0e287aa14f3f3f792757de19340d831484d59523
Signed-off-by: Paul Belanger <paul.belanger@polybeacon.com>
2013-10-10 14:48:56 -07:00
James E. Blair
5695884bb3 Make image updates independent
If we're missing an image, only rebuild that one image.

Change-Id: Id27b1351469e65ebe0fc29f2e962ddf04aded5d9
2013-10-10 14:44:42 -07:00
James E. Blair
b2367d88a6 Add the ability to ignore offline targets
If a jenkins is in shutdown mode or is offline, ignore that jenkins
for the purposes of launching nodes.  Node updates (used/complete)
for that jenkins will still be processed.

This should allow another jenkins to gracefully accept the increased
load if one goes offline.

Also, log the IP address when spinning up a node.

Change-Id: I3a8720dd5aaf154ca91cdc36136decad52eb6afa
2013-10-10 13:04:30 -07:00
James E. Blair
d9ad4e91eb Fix stats after min_ready change
The movement of the min_ready value broke stats reporting, this
corrects it.

Change-Id: I4cf052e7357263de992233802c8947d638a1227f
2013-10-09 14:28:57 -07:00
James E. Blair
10dcff2682 Fix parsing gearman status
The current code has some parse errors with more complex gearman
function names (which can show up due to the way Jenkins constructs
maven jobs).

Also, switch the calculation to examine only queued jobs (total -
running) instead of trying to calculate a worker shortage (total -
workers).  The latter doesn't deal well with multiple jobs that
require workers of the same image (it incorrectly behaves as if
they are independent).  By only examining queued workers, the actual
relationship between multiple jobs that require workers from the
same image is manifested by the fact that if, together, all such
jobs exceed the demand, we will see jobs sitting in the queue.

In other words, the overall picture is now that nodepool should
have at least enough ready+building nodes to accomodate the number
of jobs for a given worker/image that gearman is waiting to run.

Change-Id: Ibc2990ed2c7aea37bd4c94e5387c80ef840afa83
2013-10-08 09:21:22 -07:00
James E. Blair
1d4c1483aa Inspect the Gearman queue for immediate demand
Use information from Gearman to determine the immediate load
requirements of the system and spin up as many nodes as required
to meet the demand.  Use the existing information about the
min and max servers to determine the ratio of servers to spin
up from each provider.

Replaces the several fake server scripts with one script that
implements statsd, zmq, and gearman to ease testing.

Change-Id: Ic0dedc7ef2760ff664912f771377e02967ad5633
2013-10-03 07:38:53 -07:00
Jenkins
87e238e034 Merge "Make node SSH timeout configurable." 2013-09-16 22:29:12 +00:00
Clark Boylan
5aec7e9ced Make node SSH timeout configurable.
* nodepool/nodepool.py: Make the initial ssh timeout configurable
(retain default of 60 seconds). Looking at the logs there is a very high
occurence of SSH timeout from our providers. Making this timeout
configurable will allow us to adjust the timeout if necessary.
Additionally, fix the comparison between old and new provider configs,
some items were not compared when they should be.

Change-Id: I51df708cb24e93e87c2fedf36d1f9de2131c76bd
2013-09-13 12:58:54 -07:00
Monty Taylor
6f2168ffca Ignore hacking warnings
Some of us have hacking installed global, which means nodepool flake8
produces a lot of spurious warnings. Suppress them.

Change-Id: Ie869a92fa423dc022c5c37c102f5a9071ccaf1b0
2013-09-13 11:09:15 -05:00
Jenkins
c21d5b4328 Merge "Fix HOLD state" 2013-09-10 22:25:24 +00:00
Jenkins
19c0ac0509 Merge "Add default location for config file" 2013-09-10 22:25:24 +00:00
James E. Blair
40c3812837 Add default location for config file
Change-Id: Ib8ae5a70c9e260302ff16f20c676aa8707acd58c
2013-09-10 15:11:50 -07:00
James E. Blair
77d65e502f Fix fake provider
We now expect a metadata attribute on objects returned by
novaclient.

Change-Id: Id3bf754e2e745cc3fdee35bda532928e7d7b347a
2013-09-10 15:08:41 -07:00
James E. Blair
828bd98b47 Fix HOLD state
When responding to a build complete event, don't do anything to
the node if it is in the HOLD state.

Change-Id: I37e458198bfcd08472d07ca9c206c1b4551f3341
2013-09-10 15:07:43 -07:00
James E. Blair
0e96791cd8 Add image-delete command
To delete an image.

Change-Id: Ied9ac09073780c63201aaf2fcd6fa1e995c1c6ab
2013-09-06 08:51:08 -07:00
James E. Blair
3bf6a19b5b Add a delete command
To delete a node.

Change-Id: Ifd2639b1bfd069807c64d95717822f469a6601fb
2013-09-06 08:51:08 -07:00
James E. Blair
44c96bee74 Add a hold command
To put a node into the HOLD state.

Change-Id: I2f173ce928f4ff8399c6fdc00d03a09fa3b20136
2013-09-06 08:51:08 -07:00
James E. Blair
6d495ce3de Add alien-image-list command
To list images about which nodepool has no knowledge.

Change-Id: I6c861abaf6b53a8e50d730590eaa3509ca864f61
2013-09-06 08:51:07 -07:00
James E. Blair
3b3a32c55c Add alien-list command
It lists servers that exist in the provider accounts about which
nodepool has no knowledge.  Useful for identifying resource leaks.

Change-Id: Iaf71d6320d6ec7691f301208e09974cad2177ad5
2013-09-06 08:50:25 -07:00
James E. Blair
4065ad6694 Add image-update command
Change-Id: I8052bcfb3bc2d03c5e50b601d9dde7f3e7d2c43b
2013-09-03 10:21:59 -07:00
James E. Blair
7a1fe1891f Add a nodepool command
Moves the daemon command to nodepoold.

Refactor config handling a bit in NodePool to make the config
objects just contain information by default (though things
such as database handles and managers may get added to them
later as needed).

Start with the list and image-list commands.

Change-Id: If2ba7bca7ab4ef922787176af87ad5de31ae4b3e
2013-09-03 09:27:04 -07:00
James E. Blair
b1b8a569ef Add image logging
Log stdout/stderr from the image build process.  Use the provider
and image name in the log selector so that admins can route
appropriately (or at least grep).

Change-Id: I7bc74ebfca3184340b51b083695b3441f0924e83
2013-08-29 16:20:40 -07:00
Monty Taylor
1e190f5d57 Change use of error numbers to errno
The errno constants are more readble in the code.

Change-Id: I6cb4b61f4cf59f50969a7fc27cad35d9c90755f8
2013-08-28 14:56:06 -07:00
James E. Blair
5ebc2faae8 Fix image delete logic
The logic around when to delete an image was just completely wrong.

Also, rackspace sometimes returns a deleted image when we request
it, while hp returns a 404.  Deal with both of those situations.

Change-Id: I4b6d620a750bd39a1d3b89e6eb51baf37694f8a7
2013-08-25 08:53:42 -07:00
James E. Blair
6b508398f9 Fix typo in node check method
Copy/paste error.

Change-Id: Ie3824cc6f40aa4fdc3cd4a8575aad935dfeb884c
2013-08-24 15:42:09 -07:00
James E. Blair
8436625994 Fix typo in deleteImage
remote_image is a dict, not an object now.

Change-Id: I272b5c9742317af5424a7f5b44e02ec7f27cf213
2013-08-24 14:42:02 -07:00
James E. Blair
861a5dbc91 Add option to test jenkins node before use
Add a new node state, TEST, and if a test job name is supplied
put the node in the TEST state, and run that job with the node
name as a parameter.  If the job succeeds, move it into READY
and relabel it with the appropriate label (from the image name).

If it fails, immediately delete the node.

If it never runs, it will eventually be cleaned up by the
periodic cleanup task.

Change-Id: I5ba1ea8cdc832b13a760edaee841487afe7d7ce4
2013-08-22 16:17:36 -07:00
James E. Blair
cffb319ee9 Make jenkins username and private key path configurable
This mostly assists local dev, but is a good idea anyway.

Move the zmq port in the test script so it does not conflict
with the default port in the jenkins zmq plugin (more local
dev assist).

Change-Id: I68f7fc31fe7e2a819568a2f40626641dee240387
2013-08-22 16:16:28 -07:00
James E. Blair
b045d00509 Fix error with stats for de-configured resources
If a target, provider, or image did not exist in the config but
was still in the db, the stats function would encounter a KeyError.
This makes sure we can still report stats for lingering resources.

Change-Id: Iade002917dbcb2931bb4f9ff009516d24c47e743
2013-08-22 13:15:49 -07:00
James E. Blair
a684768fa4 Move setup scripts destination
Move them to /opt/nodepool-scripts so they are in a nice
world-readable location so that they can be run as any user.

Change-Id: I007e341fbe17067c164d3712fcfb7e744bdd80e9
2013-08-22 10:44:43 -07:00
James E. Blair
5935e3c747 Reduce timeout when waiting for server deletion
From 1 hour to 10 minutes.  If it isn't deleted, it will get
deleted by the next pass of the cleanup process.

Change-Id: I6dd1693d14fd215117ddbed8440ff4abe02c374c
2013-08-22 10:44:09 -07:00
James E. Blair
648816feee Change credentials-id parameter in config file
To use a dash to be consistent.

Change-Id: Ifee3cfe9ad18989d09ef896408e7bb3f78e54f2c
2013-08-22 10:44:03 -07:00
James E. Blair
4208746997 Add an ssh check periodic task
If we can't ssh into a node, delete it.

Change-Id: Ie53187ff5c37941709a2cb708b0bf76116138093
2013-08-22 10:44:00 -07:00
James E. Blair
0ec2246514 Add JenkinsManager
Same idea as a ProviderManager: serialize changes to each jenkins
server (with a rate limit).

Change-Id: I631d50dcfd13c29d2802c192d6e1ac7889256a90
2013-08-22 10:43:33 -07:00
James E. Blair
8dc6c870f2 Add ProviderManager
This is used to serialize all access to an individual provider
(nova client).  One ProviderManager is created for every provider
defined in the configuration.  Any actions that require interaction
with nova submit a task to the manager which processes them serially
with an appropriate delay to ensure that rate limits are not hit.

This solves not only rate-limit problems, but also ends multi-threaded
access to a single novaclient Client object.

Change-Id: I0cdaa747dac08cdbe4719cb6c9c220678b7a0320
2013-08-20 15:34:14 -07:00
James E. Blair
d3386fb24a Delay 1 min before deleting node
This should allow the background Jenkins console scp to complete.

Change-Id: I15a91a93c48dc4f22602837bed9df2ac93f24069
2013-08-18 12:39:59 -07:00
James E. Blair
70e68a4fe2 Cache novaclient objects
Novaclient instances (via their internal requests.Session object)
do not correctly clean up after themselves.  This visibly manifests
in the file descriptors for sockets not being closed.

A simple solution to this problem that also gains some efficiency
is to cache the novaclient objects for each provider.  Based on
limited examination and research, I believe they are thread-safe.
The underlying requests library certainly is expected to be.

Change-Id: I541a0783fabef368449ef6dc8c3cf766d3560bfa
2013-08-18 09:09:52 -07:00
James E. Blair
fd53ecc88d Tune SQLAlchemy pool parameters
We can burst and create a lot of threads, each of which will checkout
a SQLAlchemy connection from the pool.  This accomodates that.

We have a natural limit on the number of db connections -- we will
never use more than the total number of servers managed.  So in that
case, just don't set an overflow limit for the db connection pool.
This means that it will stabilize on 5 open connections and burst
to as many as needed.

Also, ensure that the connection is returned to the pool in the
context manager exit method, as well as setting it to None so that
the session can not be re-used again (this is an easy way to make
sure that it can't be used except as a context manager).

Change-Id: Ie4628326b6b84fb0979e4eceed546404c4e30637
2013-08-17 15:30:10 -07:00
James E. Blair
a5a78ef441 Use a sensible SQLAlchemy session model
The existing db session strategy was inherited from a bunch of
shell scripts that ran once in a single thread and exited.

The surprising thing is that even worked at all.  This change
replaces that "strategy" with one where each thread clearly
begins a new session as a context manager and passes that around
to functions that need the DB.  A thread-local session is used
for convenience and extra safety.

This also adds a fake provider that will produce fake images and
servers quickly without needing a real nova or jenkins.  This was
used to develop the database change.

Also some minor logging changes and very brief developer docs.

Change-Id: I45e6564cb061f81d79c47a31e17f5d85cd1d9306
2013-08-16 20:21:33 -07:00
James E. Blair
35d66f0d77 Make the target name required in the schema
Change-Id: Icccdc1cc545391fa17544f555bc0537540a53bd8
2013-08-15 17:52:25 -07:00
James E. Blair
a7144ff7d1 Require a target name when instantiating a node
This is effectively a required db field; without it, the watermark
calculation can be wrong until it's filled in, so make sure it's
there to start.

Also some minor logging changes.

Change-Id: Idc5a9cd40fe330f7a1aea4a5513267ee3c254f60
2013-08-15 17:49:44 -07:00
James E. Blair
33346b37ac Use MySQL
And some other minor changes gleaned from production testing.

Remove the scripts dir because it is no longer needed.

Change-Id: I7ffe3ed8d2a1be294637ac18bc3eaefede97d401
2013-08-15 17:49:41 -07:00
James E. Blair
c409795e57 Make the local script directory configurable
Change-Id: Ia446f2e25748725ce4b64a3a654b4e50da672944
2013-08-15 17:47:03 -07:00
James E. Blair
a641ac52a9 Handle paramiko and daemonization
Change-Id: I6e58cb24b6594dc5ee6c7ba34bf9bfed9d303480
2013-08-15 13:32:39 -07:00
James E. Blair
5866f10601 Initial commit
Much of this comes from devstack-gate.

Change-Id: I7af197743cdf9523318605b6e85d2cc747a356c7
2013-08-15 09:47:23 -07:00
OpenStack Project Creator
a3db12fae9 Added .gitreview 2013-08-13 17:10:06 +00:00