ironic/ironic/drivers
Julia Kreger d1ffc6a557 Handle agent still doing the prior command
The agent command exec model is based upon an incoming
heartbeat, however heartbeats are independent and
commands can take a long time. For example, software RAID
setup in CI can encounter this.

From an IPA log:

[-] Picked root device /dev/md0 for node c6ca0af2-baec-40d6-879d-cbb5c751aafb
    based on root device hints {'name': '/dev/md0'}
[-] Attempting to download image from http://199.204.45.248:3928/agent_images/
    c6ca0af2-baec-40d6-879d-cbb5c751aafb
[-] Executing command: standby.get_partition_uuids with args: {} execute_command
    /usr/local/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py:255
[-] Tried to execute standby.get_partition_uuids, agent is still executing Command name:
    execute_deploy_step, params: {'step': {'interface': 'deploy', 'step': 'write_image',
    'args': {'image_info': {'id': 'cb9e199a-af1b-4a6f-b00e-f284008b8046',
    'urls': ['http://199.204.45.248:3928/agent_images/c6ca0af2-baec-40d6-879d-cbb5c751aafb'],
    'disk_format': 'raw', 'container_format': 'bare', 'stream_raw_images': True, 'os_hash_algo':
    'sha512', 'os_hash_value':<trimed>

This was with code built on master, using master images.
Inside the conductor log, it notes that it is likely an out
of date agent because only AgentAPIError is evaluated,
however any API error is evaluated this way. In reality, we need
to explicitly flag *when* we have an error that is because
we've tried to soon as something is already being worked upon.

The result, is to evaluate and return an exception indicating work
is already in flight.

Update - It looks like, the original fix to prevent busy agent
recognition did not fully detect all cases as getting steps is a
command which can
get skipped by accident with a busy agent, under certain circumstances.
Change I5d86878b5ed6142ed2630adee78c0867c49b663f in ironic-python-agent
also changed the string that was being checked for the previous
handling, where we really should have just made the string we were
checking lower case in ironic. Oh well! This should fix things
right up.

Story: 2008167
Task: 41175
Change-Id: Ia169640b7084d17d26f22e457c7af512db6d21d6
(cherry picked from commit 545dc2106b)
2021-03-02 19:37:05 +00:00
..
modules Handle agent still doing the prior command 2021-03-02 19:37:05 +00:00
__init__.py Remove copyright from empty files 2014-01-07 21:05:01 +08:00
base.py IPMI: Handle vendor set boot device differences 2020-12-17 11:23:09 -05:00
drac.py Add Redfish BIOS interface to idrac HW type 2020-09-23 13:34:50 +00:00
fake_hardware.py BIOS Settings: Add BIOSInterface 2018-05-08 15:16:52 +08:00
generic.py Deprecate the iscsi deploy interface 2020-09-22 15:39:36 +02:00
hardware_type.py Use property plus abstractmethod for abstractproperty 2020-08-06 11:34:23 +02:00
ibmc.py Fix: review from dtantsur of 728123 2020-06-17 17:41:55 +08:00
ilo.py Adds ilo-uefi-https boot interface to ilo5 2020-09-17 13:20:53 +00:00
intel_ipmi.py Add IntelIPMIHardware 2019-06-25 13:46:26 +05:30
ipmi.py Add "noop" management and use it in the "ipmi" hardware type 2018-08-07 13:25:50 +00:00
irmc.py Support iRMC hardware type again 2020-09-29 23:20:21 +09:00
raid_config_schema.json Allow specifying target devices for software RAID 2020-03-17 14:31:38 +01:00
redfish.py Move redfish-virtual-media to the back of supported_boot_interfaces 2020-08-21 14:56:20 +00:00
snmp.py Switch the "snmp" hardware type to "noop" management 2018-08-07 15:40:29 +00:00
utils.py Collect ramdisk logs also during cleaning 2020-05-14 18:38:31 +02:00
xclarity.py Remove the xclarity deprecation 2018-10-24 13:01:17 -07:00