nodepool

Author	SHA1	Message	Date
Jenkins	f03b6d6aa7	Merge "Add cloud-user to non-root user list"	2014-05-28 20:43:30 +00:00
James E. Blair	e829bcb6b1	Don't accept tasks for stopped managers In case a provider manager is being replaced due to a configuration change, raise exceptions for any tasks submitted after it is stopped. This will probably cause many errors, but they should already be handled and servers will eventually be deleted. We can minimize the disruption by making the provider managers more adaptable to changes, but this stopgap measure should at least fix the current problem we are observing with threads that get stuck and never complete. Change-Id: I3d190881ede30480d7c4ae970a0cb2dd07c3e160	2014-05-22 15:57:49 -07:00
Jenkins	18567ba04f	Merge "Create new provider managers on image data changes"	2014-05-22 17:51:37 +00:00
Clark Boylan	941c62aabc	Create new provider managers on image data changes Previously only non image provider config would force nodepool to create a new provider manager. There are cases when it is useful to have nodepool create a new manager when the image config changes. For example, when you want to add an image or change the flavor of an image. Add additional checks to see if the list of image names has changed and if all the image names remain the same check if any of their properties have changed. If any of these things have changed create a new provider manager. Change-Id: Ic0732453e12069d5d50a56dea0037817e978dad4	2014-05-22 10:33:45 -07:00
Jenkins	5ef2df9d21	Merge "Correct update-image to image-update in samples"	2014-05-22 17:23:21 +00:00
Brad P. Crochet	62deb91500	Add cloud-user to non-root user list RHEL and CentOS cloud images use 'cloud-user' as the default user for login. This adds support for trying that user if root does not work. Change-Id: Ie9e4276800408dee5941971d49da35252f9a74ee	2014-05-21 13:05:15 -04:00
Jenkins	90f5596f52	Merge "Create snapshots when min-ready is >= 0"	2014-05-08 14:26:08 +00:00
Jenkins	0911a74e9d	Merge "Fix typo in launch stats"	2014-05-05 17:36:32 +00:00
James E. Blair	34a620f1ae	Fix typo in launch stats This was preventing the new launch stats from reporting. Change-Id: Iec7dd8cb2b90b77c6650f27182dae79ad05e541a	2014-05-02 15:12:14 -07:00
Jenkins	7747eb461c	Merge "Improve logging/stats around launch errors"	2014-05-01 14:20:13 +00:00
Jenkins	4282802a26	Merge "Fix race in tests"	2014-05-01 14:18:41 +00:00
James E. Blair	c664e5e93b	Improve logging/stats around launch errors Break down launch errors into one of four categories. Report stats on those independently for providers, images, and targets. Report elapsed time as well. And do the same for successful launches. Also, log the provider and specific launch error in the case of failure. Change-Id: Ib33f0908add0d9b5160ee6b20cbccc30b56e6d57	2014-04-30 17:47:02 -07:00
James E. Blair	00671dd164	Fix race in tests It was possible for a test to run before nodepool loaded the config. Explicitly wait for that. Change-Id: I9a358ac6534c43a000d2b3ec0210a63c57805731	2014-04-30 17:46:22 -07:00
Jeremy Stanley	7f05bbd0fd	Correct update-image to image-update in samples Change-Id: Ie68346e092a8d94852d51e02425df43befa5df08	2014-04-04 18:08:43 +00:00
Paul Belanger	ca12e64901	Create launch-timeout setting for providers We now have the ability to override the default timeout (3600) when launching a snapshot image. Change-Id: I7696138cfc29ef876c3fef104b70098708990b2b Signed-off-by: Paul Belanger <paul.belanger@polybeacon.com>	2014-04-03 22:22:33 -04:00
James E. Blair	7b3df8acd5	Fix neutron configuration error AttributeError: 'Provider' object has no attribute 'nics' Just make nics always defined on the provider object. Code that uses it is wrapped with 'if use_neutron' anyway. Change-Id: I4de69d7aaa7e262076ec0d1dc96d1dd365f9efd4	2014-04-02 15:10:53 -07:00
Clark Boylan	d1711aab80	Immediately delete a floating IP if doesn't attach When floating IPs fail to attach to nodes nodepool leaks that floating IP. Correct this by immediately removing the floating IP as we have the IPs id after creation but not during cleanupServer (because the floating IP was never associated to any server). Closes-Bug: #1301111 Change-Id: Ib6460797a8d9b31b1ad723e056dbfe6d57438bf7	2014-04-01 20:42:01 -07:00
Jenkins	418d1cee95	Merge "Detect neutron net-changes on reconfigure."	2014-04-02 00:35:34 +00:00
James E. Blair	5be35f1346	Protect against /etc/nodepool not existing To transition from images without /etc/nodepool to images with, check if /etc/nodepool exists before writing anything there. Later, this change can be removed once we can assume it always exists. Change-Id: Ia39653cbbcea68cad618351bb61a39190a67e8b6	2014-04-01 11:34:26 -07:00
Clark Boylan	a93bb48cf5	Fix tox's insane pip install command. Override tox's pip install command so that prereleases (alphas, betas, broken things) aren't installed by default. Instead only install release versions unless a prerelease version is specifically provided. Do this by overriding the tox install command value. This fixes doc builds that use `tox -evenv python setup.py build_sphinx`. Also, remove the gearman servers from the test configuration (since there is not yet any support for it). Change-Id: I31913420adcb48866d3996f2dd3b605c55acce2e	2014-04-01 11:27:58 -07:00
Robert Collins	303d62ff79	Detect neutron net-changes on reconfigure. Change-Id: If142c10a4ada373e245e5f0d29026b0e943aaa1b	2014-04-01 11:21:09 +13:00
Paul Belanger	1b904ec697	Create snapshots when min-ready is >= 0 Currently, if min-ready is 0 a snapshot will not be created (nodepool considers the image to be disabled). Now if min-ready is greater than or equal 0, nodepool will create the snapshot. The reason for the change is to allow jenkins slaves to be offline waiting for a new job to be submitted. Once a new job is submit, nodepool will properly launch a slave node from the snapshot. Additionally, min-ready is now optional and defaults to 2. If min-ready is -1 the snapshot will not be created (label becomes disabled). Closes-Bug: #1299172 Change-Id: I7094a76b09266c00c0290d84ae0a39b6c2d16215 Signed-off-by: Paul Belanger <paul.belanger@polybeacon.com>	2014-03-31 15:23:58 -04:00
Jenkins	933a7e80dc	Merge "Fix update-image command"	2014-03-31 18:06:57 +00:00
James E. Blair	4a60b2846e	Fix update-image command The updateImage method signature changed to require name strings instead of objects. Change-Id: Ic8aefe86e59d2e36db903fe3735b4907a6c2bf2a	2014-03-31 10:53:02 -07:00
James E. Blair	ddcf1543fe	Fix missing attribute error in subnodes If statsd was enabled, we would hit an undefined attribute error because subnodes have no target. Instead, pass through the target name of the parent node and use that for statsd reporting. Change-Id: Ic7a04a85775a23f954ea565e8c82976b52b218c7	2014-03-31 09:22:00 -07:00
James E. Blair	92b9842951	Add a test for subnodes Some misc changes related to running this: * Set log/stdout/err capture vars as in ZUUL * Give the main loop a configurable sleep value so tests can run faster * Fix a confusing typo in the node.yaml config Additionally, a better method for waiting for test completion is added which permits us to use assert statements in the tests. Change-Id: Icddd2afcd816dbd5ab955fa4ab5011ac8def8faf	2014-03-31 09:22:00 -07:00
James E. Blair	fca89ee0a0	Add a very basic functional test It starts the daemon with a simple config file and ensures that it spins up a node. A timeout is added to the zmq listener so that its run loop can be stopped by the 'stopped' flag. And the shutdown procedure for nodepool is altered so that it sets those flags and waits for those threads to join before proceeding. The previous method could occasionally cause assertion errors (from C, therefore core dumps) due to zmq concurrency issues. Change-Id: I7019a80c9dbf0396c8ddc874a3f4f0c2e977dcfa	2014-03-31 09:22:00 -07:00
James E. Blair	852c6b0b96	Add per-test database fixture And test it. Change-Id: I49fb5f58127ed2a1c80282b55e30336da725b75c	2014-03-31 09:22:00 -07:00
James E. Blair	faef2431a7	Finish initial docs Finish the initial sections defined in the documentation index. Add sphinxcontrib-programoutput to document command line utils. Add py27 to the list of default tox targets. Change-Id: I254534032e0706e410647b023249fe3af4f3a35f	2014-03-31 09:21:56 -07:00
James E. Blair	22961c5ba5	Add tests for the allocator And fix a bug in it that caused too-small allocations in some circumstances. The main demonstration of this failure was: nodepool.tests.test_allocator.TwoLabels.test_allocator(two_nodes) Which allocated 1,2 instead of 2,2. But the following tests also failed: nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(four_nodes_over_quota) nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(four_nodes) nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(three_nodes) nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(four_nodes_at_quota) nodepool.tests.test_allocator.TwoProvidersTwoLabels.test_allocator(one_node) Change-Id: Idba0e52b2775132f52386785b3d5f0974c5e0f8e	2014-03-31 09:20:16 -07:00
James E. Blair	db5602a91e	Add ready-script and multi-node support Write information about the node group to /etc/nodepool, along with an ssh key generated specifically for the node group. Add an optional script that is run on each node (and sub-node) for a label right before a node is placed in the ready state. This script can use the data in /etc/nodepool to setup access between the nodes in the group. Change-Id: Id0771c62095cccf383229780d1c4ddcf0ab42c1b	2014-03-31 09:20:15 -07:00
Jenkins	ad7b9a849b	Merge "Fix the allocation distribution"	2014-03-28 23:22:34 +00:00
James E. Blair	da96ca9ff4	Fix the allocation distribution The new simpler method for calculating the weight of the targets is a little too simple and can miss allocating nodes. Make the weight change as the algorithm walks through the target list to ensure that everything is allocated somewhere. Change-Id: I98f72c69cf2793aa012f330219cd850a5d4ceab2	2014-03-28 15:51:59 -07:00
James E. Blair	7bfda82c6b	Fix image/label name typo in stats Change-Id: Ibd75b0ef169d6a3cc2a1ebfbea54139dfa28dedc	2014-03-28 15:12:11 -07:00
James E. Blair	9d4e56ff57	Add 'labels' as a configuration primitive Labels replace images as the basic identity for nodes. Rather than having nodes of a particular image, we now have nodes of a particular label. A label describes how a node is created from an image, which providers can supply such nodes, and how many should be kept ready. This makes configuration simpler (by not specifying which images are associated with which targets and simply assuming an even distribution, the target section is _much_ smaller and _much_ less repetitive). It also facilitates describing how a nodes of potentially different configurations (e.g., number of subnodes) can be created from the same image. Change-Id: I35b80d6420676d405439cbeca49f4b0a6f8d3822	2014-03-28 09:12:27 -07:00
Jenkins	9dd3ced2b1	Merge "Raise min_demand due to slow node boot times"	2014-03-28 15:53:18 +00:00
Jenkins	72f7333765	Merge "Stop waiting for resources in ERROR state"	2014-03-28 01:05:56 +00:00
James E. Blair	71e1419f61	Stop waiting for resources in ERROR state As soon as a resource changes to the ERROR state, stop waiting for it. Return it to the caller, where it will be deleted. Change-Id: I128bc4344b238b96e5696cce87f608fb2cdffa6e	2014-03-27 14:02:30 -07:00
James E. Blair	e206893f27	Add the ability to create subnodes An image can specify that it should be created with a number of subnodes. That number of nodes of the same image type will also be created and associated with each primary node of that image type. Adjust the allocator to accomodate the expected loss of capacity associated with subnodes. If a node has subnodes, wait until they are all in the ready state before declaring a node ready. Change-Id: Ia4b315b1ed2999da96aab60c5c02ea2ce7667494	2014-03-27 12:57:40 -07:00
James E. Blair	30bf0ecb87	Add SubNodes and the ability to delete them There's no way to create subnodes yet. But this change introduces the class/table and a method to cleanly delete them if they exist. The actual server deletion and the waiting for that to complete are separated out in the provider manager, so that we can kick off a bunch of subnode deletes and then wait for them all to complete in one thread. All existing calls to cleanupServer are augmented with a new waitForServerDeletion call to handle the separation. Change-Id: Iba9d5a0a61cccc07d914e60a24777c6451dca7ea	2014-03-27 12:57:40 -07:00
Jenkins	14b308b20b	Merge "Include provider names in timeout messages"	2014-03-26 20:45:37 +00:00
Clark Boylan	841750fdea	Depend on hacking for its dependencies. flake8 does not pin its pep8 and pyflakes dependencies which makes it possible for new sets of rules to suddenly apply whenever either of those projects pushes a new release. Fix this by depending on hacking instead of flake8, pep8, and pyflakes directly. This will keep nodepool in sync with the rest of openstack even if it doesn't use the hacking rules (H*). Change-Id: Ice9198e9439ebcac15e76832835e78f72344425c	2014-03-26 12:39:41 -07:00
James E. Blair	fdc3616927	Include provider names in timeout messages Also, remove the extra "waiting for". Change-Id: I5842daecb6b193eb5d6d2d2662dfab89ac8f7344	2014-03-25 17:56:29 -07:00
Paul Belanger	e2fff8cd15	Set paramiko version > 1.9.0 Ubuntu 12.04 package version for paramiko is 1.7.7.1, which lacks the additional arguments for exec_command: TypeError: exec_command() got an unexpected keyword argument 'get_pty' 1.10.0 was the first version to add get_pty flag. Change-Id: I3b4d8a6d8a1d10ab002a79824feab8937d160244 Signed-off-by: Paul Belanger <paul.belanger@polybeacon.com>	2014-03-23 19:36:41 -04:00
James E. Blair	6ab771728e	Delete created keypairs if nova boot fails If the server fails to boot (perhaps due to quota issues) during an image update, if nodepool created a key for that server it won't be deleted because the existing keypair delete is done as part of deleting the server. Handle the special case of never having actually booted a server and delete the keypair explicitly in that case. Change-Id: I0607b77ef2d52cbb8a81feb5e9c502b080a51dbe	2014-03-20 10:24:33 -07:00
Joe Gordon	4cae69fd01	Raise min_demand due to slow node boot times Booting a node takes up to 16 minutes, so keep more nodes in ready state. Change-Id: I0ae647c658feffabc499c96b0a9ed11855202c4b	2014-03-17 14:28:04 -07:00
Jenkins	bd6f5cdf54	Merge "Roll up node stats"	2014-03-14 21:59:03 +00:00
Fengqian Gao	366746aeb0	Keep py3.X compatibility for urllib/urllib2 Use six.moves.urllib instead of urllib and six.moves.urllib.request instead of urllib2. Partial-Bug: #1280105 Change-Id: Id122f7be5aa3e0dd213bfa86f9be86d10d72b4a6	2014-02-25 16:51:03 +08:00
James E. Blair	7e0c42d035	Roll up node stats As the number of providers and targets grows, the number of stats that graphite has to sum in order to produce the summary graphs that we use grows. Instead of asking graphite to summarize something like 400 metrics (takes about 3 seconds), have nodepool directly produce the metrics that we are going to use. Change-Id: I2a7403af2512ace0cbe795f2ec17ebcd9b90dd09	2014-02-24 17:29:19 -08:00
Jenkins	839646ecbe	Merge "Add fedora support"	2014-02-24 21:49:37 +00:00

... 45 46 47 48 49 ...

2497 Commits