Browse Source

Properly handle TaskManagerStopped exception

When we lose a task manager, we won't be able to create an instances.
Rather then continue to look until retries limit is reached, we raise an
exception early.

In the case of below, the retry limit is very high and results in logs
being spammed with the following:

  2019-02-12 16:41:15,628 ERROR nodepool.NodeLauncher-0001616109: Request 200-0000443406: Launch attempt 39047/999999999 failed for node 0001616109:
  Traceback (most recent call last):
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/nodepool/driver/openstack/handler.py", line 241, in launch
      self._launchNode()
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/nodepool/driver/openstack/handler.py", line 142, in _launchNode
      instance_properties=self.label.instance_properties)
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/nodepool/driver/openstack/provider.py", line 340, in createServer
      return self._client.create_server(wait=False, **create_args)
    File "<decorator-gen-32>", line 2, in create_server
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/openstack/cloud/_utils.py", line 377, in func_wrapper
      return func(*args, **kwargs)
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/openstack/cloud/openstackcloud.py", line 7020, in create_server
      self.compute.post(endpoint, json=server_json))
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/keystoneauth1/adapter.py", line 357, in post
      return self.request(url, 'POST', **kwargs)
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/openstack/_adapter.py", line 154, in request
      **kwargs)
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/openstack/task_manager.py", line 219, in submit_function
      return self.submit_task(task)
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/openstack/task_manager.py", line 185, in submit_task
      name=self.name))
  openstack.exceptions.TaskManagerStopped: TaskManager rdo-cloud-tripleo is no longer running

Change-Id: I5f907d19ec1e637defe90eb944f4e5bd759e8a74
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
tags/3.5.0
Paul Belanger 3 months ago
parent
commit
e8ac13027e
1 changed files with 4 additions and 0 deletions
  1. 4
    0
      nodepool/driver/openstack/handler.py

+ 4
- 0
nodepool/driver/openstack/handler.py View File

@@ -241,6 +241,10 @@ class OpenStackNodeLauncher(NodeLauncher):
241 241
             try:
242 242
                 self._launchNode()
243 243
                 break
244
+            except openstack.exceptions.TaskManagerStopped:
245
+                # If we lost our TaskManager session, we won't be able to
246
+                # launch an instance, so there's no need to continue.
247
+                raise
244 248
             except kze.SessionExpiredError:
245 249
                 # If we lost our ZooKeeper session, we've lost our node lock
246 250
                 # so there's no need to continue.

Loading…
Cancel
Save