Merge "Add nodepool-launcher spec"

2015-03-03 00:17:17 +00:00 · 2015-03-03 00:17:17 +00:00 · 4107c87dfd
commit 4107c87dfd
parent 11c74477cb 8dec83392f
1 changed files with 167 additions and 0 deletions
--- a/specs/nodepool-launch-workers.rst
+++ b/specs/nodepool-launch-workers.rst
@ -0,0 +1,167 @@
 ::
  Copyright (c) 2014 Hewlett-Packard Development Company, L.P.
  This work is licensed under a Creative Commons Attribution 3.0
  Unported License.
  http://creativecommons.org/licenses/by/3.0/legalcode
 ..
  This template should be in ReSTructured text. Please do not delete
  any of the sections in this template.  If you have nothing to say
  for a whole section, just write: "None". For help with syntax, see
  http://sphinx-doc.org/rest.html To test out your formatting, see
  http://www.tele3.cz/jbar/rest/rest.html
 ==================================
 Nodepool launch and delete workers
 ==================================
 Story: https://storyboard.openstack.org/#!/story/2000075
 Split the node launch and delete operations into separate workers for
 scalability and flexibility.
 Problem Description
 ===================
 When nodepool launches or deletes a node, it creates a thread for the
 operation.  As nodepool scales up the number of nodes it manages, it
 may have a very large number of concurrent threads.  To launch 1,000
 nodes would consume an additional 1,000 threads.  Much of this time is
 spent waiting (sleeping or performing network I/O outside of the
 global interpreter lock), so despite Python's threading limitations,
 this is generally not a significant performance problem.
 However, recently we have seen that seemingly small amounts of
 additional computation can starve important threads in nodepool, such
 as the main loop or the gearman I/O threads.  It would be better if we
 could limit the impact of thread contention on critical paths of the
 program while still preserving our ability to launch >1,000 nodes at
 once.
 Proposed Change
 ===============
 Create a new worker (independent process which may run on either the
 main nodepool host or one or more new servers) which performs node
 launch and delete taks called 'nodepool-launcher'.  All of the
 interaction with providers related to launching and deleting servers
 (including ip allocation, initial ssh sessions, etc) will be done with
 this worker.
 The nodepool-launcher worker would read a configuration file with the
 same syntax as the main nodepool server in order to obtain cloud
 credentials.  The worker should be told which providers it should
 handle via command-line arguments (the default should be all
 providers).
 It will register functions with gearman in the form
 "node-launch:<provider>" and "node-delete:<provider>" for each of the
 providers it handles.  Generally a single worker should expect to have
 exclusive control of a given provider, as that allows the rate
 limiting performed by the provider manager to be effective.  Though
 there should be no technical limitation that enforces this, just a
 recommendation to the operator to avoid having more than one
 nodepool-launcher working with any given provider.
 The worker will launch threads for each of the jobs in much the same
 way that the current nodepool server does.  The worker may handle as
 many simultaneous jobs as desired.  This may be unlimited as it is
 currently, or it could be a configurable limit so that, say, it does
 not have more than 100 simultaneous server launches running.  It is
 not expected that the launcher would suffer the same starvation issues
 that we have seen in the main nodepool server (due to its more limited
 functionality), but if it does, this control could be used to mitigate
 it.
 The main nodepool server will then largely consist of the main loop
 and associated actions.  Anywhere that it currently spawns a thread to
 launch or delete a node should be converted into a gearman function
 call to launch or delete.  The main loop will still create the
 database entry for the initial node launch (so that its calulations
 may proceed as they do now) and should simply pass the node id as an
 argument to the launch gearman function.  Similarly, it should mark
 nodes as deleted when the ZMQ message arrives, and then submit a
 delete function call with the node id.
 The main loop currently keeps track of ongoing delete threads so that
 the periodic cleanup task does not launch more than one.  Similarly
 with this change it should keep track of delete *jobs* and not launch
 more than one simultaneously.  It should additionally keep track of
 launch jobs, and if the launch is unsuccessful (or the worker
 disconnects -- this also returns WORK_FAIL) it should mark the node
 for deletion and launch a delete thread.  This will maintain the
 current behavior where if nodepool is stopped (in this case, a
 nodepool launch-worker is stopped), building nodes are deleted rather
 than being orphaned.
 Alternatives
 ------------
 Nodepool could be made into a more single-threaded application,
 however, we would need to devise a state machine for all of the points
 at which we wait for something to complete during the launch cycle,
 and they are quite numerous and changing all the time.  This would
 seem to be very complex whereas threading is actually an ideal
 paradigm.
 Implementation
 ==============
 Assignee(s)
 -----------
 Primary assignee: unknown
 Work Items
 ----------
 * Create nodepool-launcher class and command
 * Change main server to launch gearman jobs instead of threads
 * Stress test
 Repositories
 ------------
 This affects nodepool and system-config.
 Servers
 -------
 No new servers are required, but are optional.  Initial implementation
 should be colocated on the current nodepool server (it has
 underutilized virtual CPUs).
 DNS Entries
 -----------
 None.
 Documentation
 -------------
 The infra/system-config nodepool documentation should be updated to
 describe the new system.
 Security
 --------
 The gearman protocol is cleartext and unauthenticated.  IP based
 access control is currently used, and certificate support along with
 authentication is planned and work is in progress.  No sensitive
 information will be sent over the wire (workers will read cloud
 provider credentials from a local file).
 Testing
 -------
 This should be testable privately and locally before deployment in
 production.
 Dependencies
 ============
 None.
 Similar in spirit, but does not require https://review.openstack.org/#/c/127673/