nodepool

Author	SHA1	Message	Date
Jenkins	e88bff13a6	Merge "Pass correct server_id when adding public IP"	2013-10-10 23:03:58 +00:00
Paul Belanger	521a690414	Pass correct server_id when adding public IP Change-Id: I0e287aa14f3f3f792757de19340d831484d59523 Signed-off-by: Paul Belanger <paul.belanger@polybeacon.com>	2013-10-10 14:48:56 -07:00
James E. Blair	5695884bb3	Make image updates independent If we're missing an image, only rebuild that one image. Change-Id: Id27b1351469e65ebe0fc29f2e962ddf04aded5d9	2013-10-10 14:44:42 -07:00
James E. Blair	b2367d88a6	Add the ability to ignore offline targets If a jenkins is in shutdown mode or is offline, ignore that jenkins for the purposes of launching nodes. Node updates (used/complete) for that jenkins will still be processed. This should allow another jenkins to gracefully accept the increased load if one goes offline. Also, log the IP address when spinning up a node. Change-Id: I3a8720dd5aaf154ca91cdc36136decad52eb6afa	2013-10-10 13:04:30 -07:00
James E. Blair	d9ad4e91eb	Fix stats after min_ready change The movement of the min_ready value broke stats reporting, this corrects it. Change-Id: I4cf052e7357263de992233802c8947d638a1227f	2013-10-09 14:28:57 -07:00
James E. Blair	10dcff2682	Fix parsing gearman status The current code has some parse errors with more complex gearman function names (which can show up due to the way Jenkins constructs maven jobs). Also, switch the calculation to examine only queued jobs (total - running) instead of trying to calculate a worker shortage (total - workers). The latter doesn't deal well with multiple jobs that require workers of the same image (it incorrectly behaves as if they are independent). By only examining queued workers, the actual relationship between multiple jobs that require workers from the same image is manifested by the fact that if, together, all such jobs exceed the demand, we will see jobs sitting in the queue. In other words, the overall picture is now that nodepool should have at least enough ready+building nodes to accomodate the number of jobs for a given worker/image that gearman is waiting to run. Change-Id: Ibc2990ed2c7aea37bd4c94e5387c80ef840afa83	2013-10-08 09:21:22 -07:00
James E. Blair	1d4c1483aa	Inspect the Gearman queue for immediate demand Use information from Gearman to determine the immediate load requirements of the system and spin up as many nodes as required to meet the demand. Use the existing information about the min and max servers to determine the ratio of servers to spin up from each provider. Replaces the several fake server scripts with one script that implements statsd, zmq, and gearman to ease testing. Change-Id: Ic0dedc7ef2760ff664912f771377e02967ad5633	2013-10-03 07:38:53 -07:00
Jenkins	87e238e034	Merge "Make node SSH timeout configurable."	2013-09-16 22:29:12 +00:00
Clark Boylan	5aec7e9ced	Make node SSH timeout configurable. * nodepool/nodepool.py: Make the initial ssh timeout configurable (retain default of 60 seconds). Looking at the logs there is a very high occurence of SSH timeout from our providers. Making this timeout configurable will allow us to adjust the timeout if necessary. Additionally, fix the comparison between old and new provider configs, some items were not compared when they should be. Change-Id: I51df708cb24e93e87c2fedf36d1f9de2131c76bd	2013-09-13 12:58:54 -07:00
Monty Taylor	6f2168ffca	Ignore hacking warnings Some of us have hacking installed global, which means nodepool flake8 produces a lot of spurious warnings. Suppress them. Change-Id: Ie869a92fa423dc022c5c37c102f5a9071ccaf1b0	2013-09-13 11:09:15 -05:00
Jenkins	c21d5b4328	Merge "Fix HOLD state"	2013-09-10 22:25:24 +00:00
Jenkins	19c0ac0509	Merge "Add default location for config file"	2013-09-10 22:25:24 +00:00
James E. Blair	40c3812837	Add default location for config file Change-Id: Ib8ae5a70c9e260302ff16f20c676aa8707acd58c	2013-09-10 15:11:50 -07:00
James E. Blair	77d65e502f	Fix fake provider We now expect a metadata attribute on objects returned by novaclient. Change-Id: Id3bf754e2e745cc3fdee35bda532928e7d7b347a	2013-09-10 15:08:41 -07:00
James E. Blair	828bd98b47	Fix HOLD state When responding to a build complete event, don't do anything to the node if it is in the HOLD state. Change-Id: I37e458198bfcd08472d07ca9c206c1b4551f3341	2013-09-10 15:07:43 -07:00
James E. Blair	0e96791cd8	Add image-delete command To delete an image. Change-Id: Ied9ac09073780c63201aaf2fcd6fa1e995c1c6ab	2013-09-06 08:51:08 -07:00
James E. Blair	3bf6a19b5b	Add a delete command To delete a node. Change-Id: Ifd2639b1bfd069807c64d95717822f469a6601fb	2013-09-06 08:51:08 -07:00
James E. Blair	44c96bee74	Add a hold command To put a node into the HOLD state. Change-Id: I2f173ce928f4ff8399c6fdc00d03a09fa3b20136	2013-09-06 08:51:08 -07:00
James E. Blair	6d495ce3de	Add alien-image-list command To list images about which nodepool has no knowledge. Change-Id: I6c861abaf6b53a8e50d730590eaa3509ca864f61	2013-09-06 08:51:07 -07:00
James E. Blair	3b3a32c55c	Add alien-list command It lists servers that exist in the provider accounts about which nodepool has no knowledge. Useful for identifying resource leaks. Change-Id: Iaf71d6320d6ec7691f301208e09974cad2177ad5	2013-09-06 08:50:25 -07:00
James E. Blair	4065ad6694	Add image-update command Change-Id: I8052bcfb3bc2d03c5e50b601d9dde7f3e7d2c43b	2013-09-03 10:21:59 -07:00
James E. Blair	7a1fe1891f	Add a nodepool command Moves the daemon command to nodepoold. Refactor config handling a bit in NodePool to make the config objects just contain information by default (though things such as database handles and managers may get added to them later as needed). Start with the list and image-list commands. Change-Id: If2ba7bca7ab4ef922787176af87ad5de31ae4b3e	2013-09-03 09:27:04 -07:00
James E. Blair	b1b8a569ef	Add image logging Log stdout/stderr from the image build process. Use the provider and image name in the log selector so that admins can route appropriately (or at least grep). Change-Id: I7bc74ebfca3184340b51b083695b3441f0924e83	2013-08-29 16:20:40 -07:00
Monty Taylor	1e190f5d57	Change use of error numbers to errno The errno constants are more readble in the code. Change-Id: I6cb4b61f4cf59f50969a7fc27cad35d9c90755f8	2013-08-28 14:56:06 -07:00
James E. Blair	5ebc2faae8	Fix image delete logic The logic around when to delete an image was just completely wrong. Also, rackspace sometimes returns a deleted image when we request it, while hp returns a 404. Deal with both of those situations. Change-Id: I4b6d620a750bd39a1d3b89e6eb51baf37694f8a7	2013-08-25 08:53:42 -07:00
James E. Blair	6b508398f9	Fix typo in node check method Copy/paste error. Change-Id: Ie3824cc6f40aa4fdc3cd4a8575aad935dfeb884c	2013-08-24 15:42:09 -07:00
James E. Blair	8436625994	Fix typo in deleteImage remote_image is a dict, not an object now. Change-Id: I272b5c9742317af5424a7f5b44e02ec7f27cf213	2013-08-24 14:42:02 -07:00
James E. Blair	861a5dbc91	Add option to test jenkins node before use Add a new node state, TEST, and if a test job name is supplied put the node in the TEST state, and run that job with the node name as a parameter. If the job succeeds, move it into READY and relabel it with the appropriate label (from the image name). If it fails, immediately delete the node. If it never runs, it will eventually be cleaned up by the periodic cleanup task. Change-Id: I5ba1ea8cdc832b13a760edaee841487afe7d7ce4	2013-08-22 16:17:36 -07:00
James E. Blair	cffb319ee9	Make jenkins username and private key path configurable This mostly assists local dev, but is a good idea anyway. Move the zmq port in the test script so it does not conflict with the default port in the jenkins zmq plugin (more local dev assist). Change-Id: I68f7fc31fe7e2a819568a2f40626641dee240387	2013-08-22 16:16:28 -07:00
James E. Blair	b045d00509	Fix error with stats for de-configured resources If a target, provider, or image did not exist in the config but was still in the db, the stats function would encounter a KeyError. This makes sure we can still report stats for lingering resources. Change-Id: Iade002917dbcb2931bb4f9ff009516d24c47e743	2013-08-22 13:15:49 -07:00
James E. Blair	a684768fa4	Move setup scripts destination Move them to /opt/nodepool-scripts so they are in a nice world-readable location so that they can be run as any user. Change-Id: I007e341fbe17067c164d3712fcfb7e744bdd80e9	2013-08-22 10:44:43 -07:00
James E. Blair	5935e3c747	Reduce timeout when waiting for server deletion From 1 hour to 10 minutes. If it isn't deleted, it will get deleted by the next pass of the cleanup process. Change-Id: I6dd1693d14fd215117ddbed8440ff4abe02c374c	2013-08-22 10:44:09 -07:00
James E. Blair	648816feee	Change credentials-id parameter in config file To use a dash to be consistent. Change-Id: Ifee3cfe9ad18989d09ef896408e7bb3f78e54f2c	2013-08-22 10:44:03 -07:00
James E. Blair	4208746997	Add an ssh check periodic task If we can't ssh into a node, delete it. Change-Id: Ie53187ff5c37941709a2cb708b0bf76116138093	2013-08-22 10:44:00 -07:00
James E. Blair	0ec2246514	Add JenkinsManager Same idea as a ProviderManager: serialize changes to each jenkins server (with a rate limit). Change-Id: I631d50dcfd13c29d2802c192d6e1ac7889256a90	2013-08-22 10:43:33 -07:00
James E. Blair	8dc6c870f2	Add ProviderManager This is used to serialize all access to an individual provider (nova client). One ProviderManager is created for every provider defined in the configuration. Any actions that require interaction with nova submit a task to the manager which processes them serially with an appropriate delay to ensure that rate limits are not hit. This solves not only rate-limit problems, but also ends multi-threaded access to a single novaclient Client object. Change-Id: I0cdaa747dac08cdbe4719cb6c9c220678b7a0320	2013-08-20 15:34:14 -07:00
James E. Blair	d3386fb24a	Delay 1 min before deleting node This should allow the background Jenkins console scp to complete. Change-Id: I15a91a93c48dc4f22602837bed9df2ac93f24069	2013-08-18 12:39:59 -07:00
James E. Blair	70e68a4fe2	Cache novaclient objects Novaclient instances (via their internal requests.Session object) do not correctly clean up after themselves. This visibly manifests in the file descriptors for sockets not being closed. A simple solution to this problem that also gains some efficiency is to cache the novaclient objects for each provider. Based on limited examination and research, I believe they are thread-safe. The underlying requests library certainly is expected to be. Change-Id: I541a0783fabef368449ef6dc8c3cf766d3560bfa	2013-08-18 09:09:52 -07:00
James E. Blair	fd53ecc88d	Tune SQLAlchemy pool parameters We can burst and create a lot of threads, each of which will checkout a SQLAlchemy connection from the pool. This accomodates that. We have a natural limit on the number of db connections -- we will never use more than the total number of servers managed. So in that case, just don't set an overflow limit for the db connection pool. This means that it will stabilize on 5 open connections and burst to as many as needed. Also, ensure that the connection is returned to the pool in the context manager exit method, as well as setting it to None so that the session can not be re-used again (this is an easy way to make sure that it can't be used except as a context manager). Change-Id: Ie4628326b6b84fb0979e4eceed546404c4e30637	2013-08-17 15:30:10 -07:00
James E. Blair	a5a78ef441	Use a sensible SQLAlchemy session model The existing db session strategy was inherited from a bunch of shell scripts that ran once in a single thread and exited. The surprising thing is that even worked at all. This change replaces that "strategy" with one where each thread clearly begins a new session as a context manager and passes that around to functions that need the DB. A thread-local session is used for convenience and extra safety. This also adds a fake provider that will produce fake images and servers quickly without needing a real nova or jenkins. This was used to develop the database change. Also some minor logging changes and very brief developer docs. Change-Id: I45e6564cb061f81d79c47a31e17f5d85cd1d9306	2013-08-16 20:21:33 -07:00
James E. Blair	35d66f0d77	Make the target name required in the schema Change-Id: Icccdc1cc545391fa17544f555bc0537540a53bd8	2013-08-15 17:52:25 -07:00
James E. Blair	a7144ff7d1	Require a target name when instantiating a node This is effectively a required db field; without it, the watermark calculation can be wrong until it's filled in, so make sure it's there to start. Also some minor logging changes. Change-Id: Idc5a9cd40fe330f7a1aea4a5513267ee3c254f60	2013-08-15 17:49:44 -07:00
James E. Blair	33346b37ac	Use MySQL And some other minor changes gleaned from production testing. Remove the scripts dir because it is no longer needed. Change-Id: I7ffe3ed8d2a1be294637ac18bc3eaefede97d401	2013-08-15 17:49:41 -07:00
James E. Blair	c409795e57	Make the local script directory configurable Change-Id: Ia446f2e25748725ce4b64a3a654b4e50da672944	2013-08-15 17:47:03 -07:00
James E. Blair	a641ac52a9	Handle paramiko and daemonization Change-Id: I6e58cb24b6594dc5ee6c7ba34bf9bfed9d303480	2013-08-15 13:32:39 -07:00
James E. Blair	5866f10601	Initial commit Much of this comes from devstack-gate. Change-Id: I7af197743cdf9523318605b6e85d2cc747a356c7	2013-08-15 09:47:23 -07:00
OpenStack Project Creator	a3db12fae9	Added .gitreview	2013-08-13 17:10:06 +00:00

... 46 47 48 49 50

2497 Commits