Files
nodepool/nodepool/tests/fixtures/aws
James E. Blair 242f9cc3e6 Handle early AWS spot instance reclamations
If an AWS spot instance is used as a metastatic backing node, an
unexpected series of events can occur:

* aws driver creates backing node instance
* aws driver scans ssh keys and stores them on backing node
* aws reclaims spot instance
* aws re-uses IP from backing node
* metastatic driver creates node
* metastatic driver scans ssh keys and stores them on node

Zuul would then use the wrong node (whether that succeeds depends
on what else has happened to the node in the interim).

To avoid this situation, we implement this change:
* After scanning the metastatic node ssh keys, we compare them to
  the backing node ssh keys and if they differ, trigger an error
  in the metastatic node and mark the backing node as failed.

In case the node is reclaimed one step early in the above sequence,
we implement this change:
* After completing the nodescan, the aws driver will double check
  that the instance is still running; if not, it will trigger an
  error.

The above is still subject to a small race if the nodescan time
takes less than the cache interval of the instance list, and the
node is reclaimed after the nodescan and within the cache interval
(currently 10 seconds).  In the unlikely event that does happen,
then the metastatic key check should still catch the issue as long
as the replacement node also does not boot within those 10 seconds.
(Technically possible, but the combination of all of these things
should be very unlikely in practice.)

Change-Id: I9ce1f6df04e9c49deceda99c8e4024dd98ea88f9
2024-11-05 13:24:07 -08:00
..
2022-12-14 14:13:47 +01:00
2024-01-24 15:11:35 -08:00
2022-04-27 11:59:54 -07:00
2022-02-22 17:06:10 -08:00
2022-02-22 17:06:10 -08:00