ironic-python-agent/ironic_python_agent
Julia Kreger c5b97eb781 Add timeout operations to try and prevent hang on read()
Socket read operations can be blocking and may not timeout as
expected when thinking of timeouts at the beginning of a
socket request. This can occur when streaming file contents
down to the agent and there is a hard connectivity break.

In other words, we could be in a situation like:

- read(fd, len) - Gets data
- Select returns context to the program, we do things with data.
** hard connectivity break for next 90 seconds**
-  read(fd, len) - We drain the in-memory buffer side of the socket.
-  Select returns context, we do things with our remaining data
** Server retransmits **
** Server times out due to no ack **
** Server closes socket and issues a FIN,RST packet to the client **
** Connectivity restored, Client never got FIN,RST **
** Client socket still waiting for more data **
- read(fd, len) - No data returned
- Select returns, yet we have no data to act on as the buffer is
  empty OR the buffered data doesn't meet our requried read len value.
  tl;dr noop
- read(fd, len) <-- We continue to try and read until the socket is
                    recognized as dead, which could be a long time.

NOTE: The above read()s are python's read() on an contents being
      streamed. Lower level reads exist, but brains will hurt
      if we try to cover the dynamics at that level.

As such, we need to keep an eye on when the last time we
received a packet, and treat that as if we have timed out
or not. Requests periodically yeilds back even when no data
has been received, in order to allow the caller to wall
clock the progress/status and take appropriate action.

When we exceed the timeout time value with our wall clock,
we will fail the download.

Change-Id: I7214fc9dbd903789c9e39ee809f05454aeb5a240
2020-06-23 13:25:09 -07:00
..
api Agent token support 2020-03-12 10:35:17 -07:00
cmd Agent token support 2020-03-12 10:35:17 -07:00
extensions Add timeout operations to try and prevent hang on read() 2020-06-23 13:25:09 -07:00
hardware_managers Fix gate and bump CoreOS version to latest stable. 2018-05-10 15:50:05 -07:00
shell Clear GPT and MBR headers with dd to avoid sgdisk CRC errors 2018-08-08 16:40:22 +00:00
tests Add timeout operations to try and prevent hang on read() 2020-06-23 13:25:09 -07:00
__init__.py Use # instead of """ for copyright blocks 2014-04-10 07:14:06 -07:00
agent.py Add a deploy step for writing an image 2020-06-02 15:23:54 +02:00
config.py Add timeout and retries when connection to an image server 2020-04-24 10:34:40 +02:00
dmi_inspector.py Collect processor, memory and BIOS output of dmidecode - follow-up 2017-07-27 07:30:54 -07:00
encoding.py Create a SerializableComparable class 2015-09-11 13:44:09 -07:00
errors.py Add an ability to run in-band deploy steps 2020-04-06 10:24:08 +02:00
hardware.py Merge "Add a deploy step for writing an image" 2020-06-20 00:00:10 +00:00
inspect.py Add jitter to inspection command reporting 2020-03-31 08:13:13 -07:00
inspector.py Expose collector and hardware manager names via introspection data 2020-01-22 11:15:38 +01:00
ironic_api_client.py Merge "Move minimum ironic version to latest ocata" 2020-04-15 10:22:12 +00:00
netutils.py Get the hostname of the introspected host 2019-06-12 13:00:21 +00:00
numa_inspector.py Skip nic numa_node discovery if it's not assigned to a numa_node 2020-01-17 11:15:35 +01:00
raid_utils.py Split and move logic for partition tables 2020-05-25 08:11:28 +00:00
utils.py Split and move logic for partition tables 2020-05-25 08:11:28 +00:00
version.py Add sphinx build + basic documentation 2015-03-31 16:22:12 -07:00