Merge "N3000 FPGA device image update orchestration"

2020-04-18 23:04:08 +00:00 · 2020-04-18 23:04:08 +00:00 · 115a568cec
parent 8fecfa3bf0 461a39760b
commit 115a568cec
1 changed files with 845 additions and 0 deletions
--- a/doc/source/specs/stx-4.0/approved/starlingx-2006740-fpga-update-orchestration.rst
+++ b/doc/source/specs/stx-4.0/approved/starlingx-2006740-fpga-update-orchestration.rst
@ -0,0 +1,845 @@
+..
+  This work is licensed under a Creative Commons Attribution 3.0 Unported
+  License. http://creativecommons.org/licenses/by/3.0/legalcode
+
+
+=========================================
+N3000 FPGA Bitstream Update Orchestration
+=========================================
+
+Storyboard:
+https://storyboard.openstack.org/#!/story/2006740
+
+The overall scenario is that we have an administrator operating in a central
+cloud, with hundreds or thousands of subclouds being managed from the central
+cloud.  In each subcloud there will be one or more nodes with FPGA devices.
+These devices will need to be programmed with a number of types of bitstreams
+but to ensure that service standards are met they can't all be updated at the
+same time. Instead, the admin will create policies which govern which subclouds
+are updated when, and the orchestration framework will follow those policies to
+update the various subclouds.
+
+Problem description
+===================
+
+In a distributed-cloud environment there may be hundreds or thousands of
+subclouds, each containing one or more hosts, some of which may have hardware
+devices on them (like NICs or FPGAs) which require image updates in order to
+properly provide service to applications which ultimately provide services for
+the end-user.
+
+In order to simplify management of these hardware devices, we wish to support
+orchestration of device image updates in a distributed-cloud environment,
+starting with the Intel N3000 FPGA device (which is expected to be
+commonly-used for 5G) but designing the framework in such a way that we could
+extend it to deal with other types of device images (other FPGAs, or NIC
+firmware for example) as well.
+
+For the case of the N3000 (and likely other FPGAs) there are a number of
+different image types that need to be supported, specifically one to set the
+root authentication key, one to update the FPGA core (signed with a signing
+key), and one to revoke a signing key.  For the case of NICs, you'd typically
+have a single image type.  In all cases, the image type would only be valid for
+a specific PCI vendor/device tuple.
+
+Since updating device firmware will necessarily result in a service outage, we
+need the ability to control which subclouds (which typically would correspond
+to geographic areas) can be updated in parallel.
+
+Use Cases
+---------
+
+As a cloud admin, I want to push out a hardware device image update to
+hardware devices on a single host (possibly for test purposes).
+
+As a cloud admin, I want to push out hardware device image updates to hardware
+devices on multiple hosts in a cloud.
+
+As a distributed-cloud admin, I want to push out hardware device image updates
+to hardware devices on all hosts on a single subcloud (possibly for test
+purposes).
+
+As a distributed-cloud admin, I want to push out hardware device image updates
+to hardware devices on multiple hosts on many subclouds.  While doing this, I
+want to control which hosts and which subclouds can be updated in parallel
+since I want to avoid causing service outages while doing the update.
+
+As a distributed-cloud admin, I want to be able to display whether each
+subcloud is using up-to-date device images.
+
+As a distributed-cloud admin, I want to see the status of in-progress device
+image updates.
+
+As a distributed-cloud or cloud admin, I want to be able to abort an
+orchestrated device image update such that currently-in-progress device writes
+will finish but no additional ones will be scheduled.
+
+
+Proposed change
+===============
+
+The overall architecture of the device image orchestration will be modelled
+after the existing software-patch handling.  In a single-cloud environment we
+will support uploading device images, "applying" them (which just means marking
+them as something that should get written to the hardware), and then actually
+kicking off the write to the hardware.
+
+In an distributed-cloud environment, when using device image orchestration we
+will first do the above in the SystemController region, and then use dcmanager
+to handle pushing the images down to the subcloud and kicking off the actual
+update in the subcloud.  The VIM in each subcloud will decide when to update
+the device images on each host, and a sysinv agent on each host will handle
+writing the actual device images to the hardware.
+
+In a distributed-cloud environment it will also be possible for the admin user
+to explicitly issue commands to the sysinv API endpoint for a single subcloud.
+This will essentially bypass the orchestration mechanism and behave the same
+as the single-cloud environment.
+
+Hardware Background
+-------------------
+
+The initial hardware that we want to support is the Intel N3000 [1]_, an FPGA that
+we expect will be used by 5G edge providers.  This FPGA is a little unique in
+that it takes ~40min to write the functional image to the hardware (service
+can continue during this time, then a hardware reset is required to load the
+new image).  Once the new image is loaded, the device will provide multiple
+VFs, which in turn will be exported to Kubernets as resources, where they will
+be consumed by applications running in Kubernetes containers.  Because of the
+long write times, these devices must be pre-programmed rather than programmed
+at Kubernetes pod startup.
+
+Hardware Security Model
+-----------------------
+
+By default, the N3000 will accept any valid bitstream that is sent to it (signed
+or unsigned).  The customer/vendor can generate a public/private root key pair,
+then create a *root-entry-hash* bitstream from the public key.  If a
+*root-entry-hash* bitstream is written to the N3000 it will set the root entry
+hash on the device.  From that point on, only bitstreams signed by a code
+signing key (CSK) which is in turn signed by the private root key will be
+accepted by the N3000.  Once a *root-entry-hash* bitstream has been written to
+the hardware, it cannot be erased or changed without sending the hardware back
+to the vendor.
+
+The customer/vendor can generate new *FPGA user* bitstreams.  These may be
+unsigned or signed with a CSK.  Typically each such bitstream would be signed
+by a different CSK.  Writing a new *user* bitstream will cause the new code to be
+loaded on the next bootup of the N3000.  Only one *user* bitstream can be stored
+in the N3000 at a time.
+
+The customer/vendor can create a *CSK-ID-cancellation* bitstream (generated from
+the private root key). When written to the N3000 it will revoke a
+previously-used CSK and disallow loading any images signed with it.  Multiple
+*CSK-ID-cancellation* bitstreams can be processed for each N3000. Most
+importantly, StarlingX will not deal directly with CSKs, only bitstreams.
+
+Cloud/Subcloud Sysinv FPGA Agent
+--------------------------------
+
+The low-level interactions with the physical FPGA devices will be performed by
+a new *sysinv FPGA agent* which will reside on each node with a *worker*
+subfunction.  The agent will communicate bi-directionally with sysinv-conductor
+via RPC.  The interactions with the N3000 FPGA will be performed using the OPAE
+tools [2]_ in the n3000-opae Docker image running in a container.  (This will
+require the use of privileged containers due to the need to directly access
+hardware devices.)
+
+On startup, the existing *sysinv-agent* will try to create a file under
+/run/sysinv.  If the file did not yet exist, it will send an RPC message to
+sysinv-conductor indicating that the host just rebooted.  Sysinv-conductor will
+then clear any "reboot needed" DB entry for that host if it was set. If there
+are no more "pending firmware update" entries in the DB for any host, and if no
+host has the "reboot needed" DB entry set, then the "*firmware update in
+progress*" alarm will be cleared.
+
+On startup, the existing *sysinv-agent* will do an inventory of the PCI
+devices on each worker node.  The new *sysinv-fpga-agent* will inventory the
+FPGA devices as well, including querying additional details from each FPGA
+device as per the *host-device-show* command.  The FPGA agent will send an RPC
+message to *sysinv-conductor* to update the database with up-to-date FPGA device
+information.
+
+If there are problems that need to be dealt with immediately (such as the FPGA
+booting the factory image when there should be a functional image) then
+*sysinv-conductor* will send an RPC message to *sysinv-fpga-agent* to trigger
+a *device-image-update* operation to ensure that the FPGA is up-to-date.  This
+will also cause an alarm to be raised.
+
+If the FPGA has a valid functional image but it's not the currently-active
+functional image, then we will alarm it but not trigger a *device-image-update*
+operation. In the future we may wish to extend this to check whether the
+functional image was signed with a cancelled CSK-ID and if so then trigger a
+*device-image-update* operation due to security risks.
+
+On sysinv-conductor startup it will send out a request to all *sysinv FPGA
+agents* to report their hardware status.  This is needed to deal with certain
+error scenarios.
+
+In certain error scenarios it's possible that the *sysinv FPGA agent* will be
+unable to send a message to sysinv-conductor.  It will need to handle this
+gracefully.
+
+Subcloud Sysinv Operations
+--------------------------
+
+At the single-cloud or subcloud level, the commands start out fairly typical.
+We plan to extend sysinv to introduce create/list/show/delete commands for the
+FPGA images, extend the existing *host device* commands to operate on the FPGA
+device, add new commands to *apply* or *remove* a device image, and finally add
+new commands to initiate or abort the firmware update.
+
+The concept of *apply* is used because there are different types of bitstreams
+and it's possible to have more than one bitstream that needs to be downloaded
+to a newly-added FPGA.  This will be discussed in more detail in the
+activation section below.
+
+system device-image-create
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Define a new image, specifying bitstream file, the bitstream type (root-key,
+functional, or key-revocation), an optional name/version/description, the
+applicable PCI vendor/device the image is for, and various
+bitstream-type-specific metadata such as the bitstream ID (for the
+FPGA functional image), the key signature (for the root-key image), the key ID
+being revoked, etc.  To simplify the dcmanager code, this should allow
+specifying the UUID for the image.  (Ideally we should be able to issue a GET
+in RegionOne and pass the results directly to a PUT to the same location in
+the subloud to create a new image in the subcloud.  Alternatively, a POST could
+be used but we'd have to add the UUID to the request body.) If not specified,
+the system will create a UUID for the image, the bitstream file will be stored
+in a replicated filesystem on the controller node, and the metadata will be
+stored in the sysinv database.
+
+system device-image-list
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Display high-level image data for all known images.  This would include image
+type (root-key, functional, key-revocation), UUID, version.
+
+system device-image-show
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Display detailed image data for a single image (specified via UUID).  This
+would include UUID, image type, description, key ID, bitstream ID, name,
+description, signing key signature, any activations (with device label) for the
+image, etc.
+
+system device-image-delete
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Delete an image (specified by UUID).  If an FPGA functional image is deleted
+due to a security issue, it would be wise to also upload and activate a
+key-revocation bitstream to prevent the image from being uploaded again either
+by accident or maliciously.
+
+system device-image-apply
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Make the specified image *active*, but do not actually initiate writing to the
+hardware.  This applies to a specific image, and optionally takes a device
+label key/value such that only devices with the specified label would be
+updated.  Initially only *functional*, *root-key*, and *key-revocation*
+bitstreams are supported.  Only one *root-key* bitstream can ever be written to
+an N3000, so having more than one such bitstream be active doesn't make sense.
+Applying a *functional* bitstream will *remove* all other functional bitstreams
+for that FPGA PCI vendor/device.  There can be multiple *key-revocation*
+bitstreams active.
+
+Note that it would be possible to make multiple images active, then issue a
+*host-device-image-update* command to trigger writing them all to the hardware.
+
+When an image has been applied, a "device firmware update in progress" alarm
+will be raised, and will stay raised until all affected devices have had their
+firmware updated or until the device image is removed.  This implies that a
+"pending firmware update" DB entry will be created for each affected device for
+each applied image to indicate that the image needs to be written to the
+device.
+
+system device-image-remove
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Deactivate the specified image, optionally allowing specifying a device label
+to deactivate the image only for devices with the specified label.  If you try
+to deactivate an image which is currently being written to the hardware it will
+succeed but will not abort the write.
+
+When an image is deleted, all of its activation records will also be deleted.
+(The implementation of this operation could probably be left to the end as it
+is not critical.)
+
+Removing an image will remove any "pending firmware update" DB entries for that
+image.  If there are no remaining pending firmware updates, and no "reboot
+needed" DB entries for any host, then the "device firmware update in progress"
+alarm can be cleared.
+
+system host-device-image-update
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Tell sysinv to update the specified device on the specified host with any
+active images which have not yet been written to the hardware.  In this
+scenario, sysinv-conductor will tell the FPGA agent to write each
+active-but-not-written image to the device in turn until they've all been
+written.  We would want to write the root-key bitstream first, then any
+key-revocation bitstreams, then the functional bitstream. If we have
+successfully written the functional bitstream, the admin user (or the VIM
+in the orchestrated update case) will need to lock/unlock the node to cause
+the new functional image to be loaded.
+
+While writing an image to the FPGA, we would want to block the reboot of the
+host in question.  We will only allow updating device images on unlocked hosts,
+and once the device image starts no host-lock commands will be accepted
+unless the *force* option is used.  While the FPGA agent is writing the image
+to the hardware, it will *stop* the watchdog service from running, since we
+don't want an unrelated process to trigger a reboot while we're writing to the
+hardware.  After the image has been written, the FPGA agent will restart the
+watchdog service.
+
+After each image is written the FPGA agent would send an RPC message to
+sysinv-conductor to remove the "*pending firmware update*" entry from the DB
+and to set a "reboot needed" DB entry for that host.  
+
+system host-device-image-update-abort
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Abort any pending image updates for this host.  Any in-progress device image
+updates will continue until completion or failure.
+
+(The implementation of this operation could be left towards the end, as it is
+not necessary for the success path.)
+
+system host-device-list
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Add support to the existing command so the FPGA device displays in the list.
+Add a new "needs firmware update" column.
+
+system host-device-show
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Extend the existing command to add new optional device-specific fields. For the
+N3000 this would include accelerator status, type of booted image
+(user/factory), booted image bitstream ID, cancelled CSK IDs, root entry hash,
+BMC versions, PCI device ID, onboard NIC devices, etc.
+
+We might want to include device labels (see below) in the output.
+
+System host-device-label-assign
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Assign a *key: value* label to a PCI device.  This takes as arguments the PCI
+device, the host, the key, and the value.
+
+System host-device-label-list
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+List all labels for a given PCI device.  This takes as arguments the PCI device
+and the host, and returns a list of all key/value labels for the device.
+(Alternatively could take the PCI device UUID, but the CLI doesn't expose that
+currently.)
+
+System host-device-label-remove
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Remove a label from a PCI device.  This takes as arguments the PCI device, the
+host, and the key.
+
+System device-label-list
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+List all devices and their labels from all hosts in the system.  Devices
+without any labels are not included.  This is intended for use by dcmanager to
+determine whether an image should be created in a given subcloud.
+
+system host lock/swact
+^^^^^^^^^^^^^^^^^^^^^^
+
+The *lock* operation would be blocked by default during device image update to
+prevent accidentally rebooting while in the middle of updating the FPGA image
+(since that would result in a service outage while the FPGA gets updated
+again).  Since we will only start a device image update on an unlocked host,
+this should be sufficient.
+
+If the *force* option is specified for this command, the action will
+proceed.  (This may mean that the device ends up in a bad state if the host
+reboots while a device image update was in progress.)
+
+The manual swact operation will be blocked during a device image update to
+reduce the chances that it will interfere with the image update.  The image
+update code in the rest of the system will try to deal with temporary outages
+caused by a swact, but we may need to handle it as a failure if the outage
+lasts long enough.
+
+
+Subcloud VIM Operations
+-----------------------
+
+All of these operations would be analogous to the existing sw-manager
+patch-strategy and update-strategy operations.  We're using *firmware update*
+in the CLI to allow it to be potentially more generic in the future, but
+initially these would apply to the FPGA image update only.
+
+The VIM will control the overall firmware update strategy for the subcloud.  It
+will decide whether a firmware update is currently allowed to be kicked off (if
+there are alarms raised it might block the firmware update strategy apply
+depending on the strategy), control how many hosts can do a firmware update in
+parallel, trigger each host to begin the firmware update, and aggregate the
+status of the firmware update on the various hosts.
+
+When the VIM decides to initiate a firmware update on a given host, it will
+issue the HTTP equivalent of the *system host-device-image-update* command to
+sysinv on that host to tell that host to write all *applied but not yet
+written* device images to the hardware.
+
+sw-manager fw-update-strategy create
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Check the system state and builds up a sequence of commands needed to bring the
+subcloud into alignment with the desired state of the system.  This would take
+options such as how many hosts can do firmware update in parallel, whether to
+stop on failure, whether outstanding alarms should prevent upgrade, etc.
+
+This step will loop over all hosts querying sysinv to see whether each host has
+any devices that need updating, then generate a series of steps to bring all
+relevant hosts in the subcloud up-to-date for their device images.
+
+If there are no firmware updates to be applied, the strategy creation will
+fail with the reason "no firmware updates found".
+
+sw-manager fw-update-strategy apply
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Execute the firmware update strategy.  Probably want an option similar to the
+*stage-id* supported for applying a patching strategy up to a specific stage ID.
+
+Apply the specified firmware update strategy to each host specified in the
+strategy (this would typically be all hosts which have devices which need a
+firmware update) following the policies in the strategy around serial/parallel
+updates.  For each affected host, the VIM will use the sysinv REST API to
+trigger a *system host-device-image-update* operation and then periodically
+check for the status of the update operation.
+
+sw-manager fw-update-strategy show
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Display update strategy, optionally with more details (like current status of
+the overall sequence as stored in the VIM database).
+
+sw-manager fw-update-strategy abort
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Allow existing in-progress FPGA updates to complete, do not trigger any
+additional nodes to begin FPGA updates.  Signal to sysinv to abort FPGA
+update; this will still allow in-progress FPGA updates to complete since we do
+not want to end up with a half-written image (which would require a new FPGA
+update operation to recover).
+
+(The implementation of this may be left till the end as it is not needed for
+the success path.)
+
+sw-manager fw-update-strategy delete
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Delete the strategy once no longer needed.
+
+System Controller DC Manager Operations
+---------------------------------------
+
+The DC Manager operations in the system controller are strongly related to the
+VIM operations in the subcloud, and most of them are equivalent to the
+operations for the existing sw-manager patch-strategy and update-strategy
+operations.
+
+The DC manager will control when to trigger a firmware update in a given
+subcloud, which subclouds can be updated in parallel, and whether to stop on
+failure or not.
+
+The DC manager will also handle creating/deleting device images in each
+subcloud as needed to keep the subcloud in sync with the SystemController
+region by talking to sysinv-api-proxy in the SystemController region and in
+each subcloud. The actual device image files will be stored by the
+sysinv-api-proxy in a well-known location where DC manager can access them when
+creating device images in the subclouds.  DC Manager will only create device
+images in the subcloud if there is at least one device in the subcloud which
+will be updated with the device image in question (based on any labels
+specified via the "*system device-image-apply*" command).
+
+As part of the dcmanager, there will be a periodic audit which scans a number
+of subclouds in parallel and checks whether the subcloud has all of the
+*applied* device images that it should have (based on the labels the images
+were applied against and the device labels in the subcloud), and whether all of
+the *applied* device images have been written to the devices that they
+should be.  If either of these is not true, then the subcloud "*firmware image
+sync status*" is considered "*out of sync*".  This will result in the subcloud
+as a whole being considered "*out of sync*".
+
+When a device image is *applied* in sysinv, dcmanager will be notified and will
+set the "*firmware image sync status*" to *unknown* for all subclouds
+since it does not know at this point which subcloud(s) the image needs to be
+created/applied/updated in.  On the next audit, this sync status will be
+updated to "*in sync*" or "*out of sync*" as applicable.
+
+
+dcmanager subcloud-group create/add-member/remove-member/delete/list/show
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These introduce the concept of a "*subcloud group*", which is a way of
+grouping subclouds together such that all subclouds in the group can potentially
+be upgraded in parallel.  A given subcloud can only be a member of one subcloud
+group.
+
+This is needed because the customer will likely want to ensure (as much as
+posible) that we don't update the functional image (which requires a service
+outage) on all subclouds that serve a certain geographic area (which could
+cause an outage for end-users in that area).
+
+There will be controls over how many subclouds in a group can be updated at
+once.  Dcmanager will only apply update strategies in one group at a time,
+and will update all subclouds in a group before moving on to the next subcloud
+group.
+
+
+dcmanager fw-update-strategy create
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Create a new update strategy, with options for number of subclouds to upgrade
+in parallel, whether to stop on failure, etc. (Eventually we may want to
+specify a list of which subcloud groups to update but this will not be
+included in the initial version.)  This will generate a UUID for the created
+strategy, and will generate a step for each subcloud that dcmanager thinks is
+out-of-sync that is in the specified subcloud group.  If there are any subclouds
+with an "unknown" sync state in the subcloud group then we would disallow
+the creation of a firmware update strategy for that group.
+
+
+dcmanager fw-update-strategy list
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+List the firmware update strategies, with the most important bits of
+information for each.  This should include the overall update strategy status
+(i.e. "in progress" if we've asked for it to be applied).
+
+dcmanager fw-update-strategy show
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Show specified firmware update strategy.  This would include all the metadata
+specified as part of the "create", the overall update strategy status, as
+well as the status (as reported by the subcloud VIM) of the firmware update
+strategy application for all the subclouds in the specified subcloud group.
+
+dcmanager fw-update-strategy apply
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Execute each step (where each step roughly corresponds to a subcloud) of the
+specified firmware update strategy.  This would look something like this:
+
+* Queries sysinv in RegionOne for active FPGA images using REST API.
+* For each strategy step, use sysinv REST API to:
+
+  * Query subcloud for device labels.
+  * Query subcloud for FPGA images.
+  * Create/update/delete FPGA images in subcloud as needed to bring it into
+    sync with the FPGA images in the SystemController.  We don't do this via
+    dcorch because we want to ensure the data is up to date when applying the
+    update strategy. (This process could take some time on slow subcloud link.)
+  * Apply the device image in the subcloud.
+  * Create FPGA update strategy using VIM REST API.
+  * Apply FPGA update strategy using VIM REST API.
+  * Monitor progress by querying FPGA update strategy using VIM REST API.
+
+dcmanager fw-update-strategy abort
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Pass the abort down to each subcloud, and do not process any more subclouds.
+
+(Maybe leave this till last as it is not needed for the success path.)
+
+dcmanager fw-update-strategy delete
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Delete firmware update strategy.  This would also delete the firmware update
+strategy in the subcloud using the VIM REST API.  It is not valid to delete an
+in-progress update strategy.
+
+dcmanager strategy-step list
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Extend if needed to list the strategy-steps and their state for the FPGA update
+strategy that is being applied.
+
+dcmanager strategy-step show
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Extend if needed to show step for a specific subcloud.
+
+System Controller Sysinv Operations
+-----------------------------------
+The sysinv operations in the system controller essentially duplicate the
+image-related subset of the sysinv operations in the subcloud. We don't expect
+the system controller to have any FPGAs, so the device-image-update,
+host-device-list, and host-device-show commands are not relevant.  In all cases
+the request is intercepted by sysinv-api-proxy in the SystemController region
+and forwarded to RegionOne.  Unlike normal resources, dcorch will not be used
+to synchronize the FPGA image information.
+
+system device-image-create
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Define a new image, as described in `Subcloud Sysinv Operations`_.  If
+successful, the sysinv API proxy will also save the image to
+/opt/device-image-vault, which will be a drbd-replicated filesystem analogous
+to how /opt/patch-vault is used to store patch files for orchestrated patching.
+
+system device-image-list
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Display high-level image data for all known images as per
+`Subcloud Sysinv Operations`_.
+
+system device-image-show
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Display detailed image data for a single image as per
+`Subcloud Sysinv Operations`_.
+
+system device-image-delete
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Delete an image as per `Subcloud Sysinv Operations`_.  Remind the user about
+uploading and activating a key-revocation bitstream for security issues
+when deleting a functional image.  Deleting an active image will not be
+allowed.
+
+system device-image-apply
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Make the specified image *active* as per `Subcloud Sysinv Operations`_.
+
+system device-image-remove
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Make the specified image *inactive* as per `Subcloud Sysinv Operations`_.
+
+Fault Handling
+--------------
+
+While a device is in the middle of updating its functional image, it's
+possible that a fault could occur that would normally result in the host being
+rebooted.  If we reboot while updating the N3000 functional image it could
+result in a 40-minute outage on host startup while we flash the functional
+image again.
+
+Given the above, the desired behavior while a device image update is in
+progress is to avoid rebooting on faults (critical process alarm, low memory
+alarm, etc.) as long as the fault is not something (like high temperature) that
+could actually damage the hardware.
+
+This is less of an issue for AIO-SX since we're already suppressing mtce reboot
+actions.
+
+The host watchdog will currently reset the host under certain circumstances.
+This is undesirable if we're in the middle of updating device images, so the
+sysinv FPGA agent will temporarily shut down the "hostwd" daemon during device
+image update and start it back up again after.  (Later on we may want to
+modify it to stay running but emit logs instead of actually resetting the
+host.)
+
+CLI Clients
+-----------
+
+We will extend the existing *system*, *sw-manager*, and *dcmanager* clients to
+add the new commands and extend the existing commands where applicable.
+
+Specifically for the case of system host-device-show the expectation is that
+the new FPGA-specific fields will only be returned by the server for FPGA
+devices.  The client will need to be able to handle the variable set of fields
+rather than assuming a constant set of fields.
+
+Web GUI
+-------
+
+If we want to allow this to be handled entirely through the GUI we'd need to
+add support for all the system controller operations from sysinv and dcmanager.
+
+This will not be implemented in the initial release.
+
+Alternatives
+------------
+
+Given our existing infrastructure, there aren't too many alternatives.  We
+could extend the existing *sysinv-agent* instead of making a new FPGA-specific
+agent, but there's going to be a fair bit of hardware-specific code in the new
+agent so that might not make sense.
+
+The VIM and dcmanager changes closely align with how we already support
+software patching and software upgrade, so this enables maximum code re-use.
+
+Sysinv already talks to the hardware and deals with PCI devices, as well as
+controlling the lock/unlock/reboot operations, so it's the logical place to
+handle the interactions between those operations and the device image updates.
+
+Data model impact
+-----------------
+
+The dcmanager DB will have a new "subcloud_group" table which maps subclouds
+into groups.  Subclouds within a group can be updated in parallel, while
+subclouds from different groups cannot.
+
+The sysinv DB will have a new *fpga_devices* table which will include new
+fields that are specific to the FPGA devices.  Each row will be associated
+with a row in the *pci_devices* table.
+
+The sysinv DB *pci_devices* table will get a new "needs firmware update"
+column.
+
+The sysinv DB will get a new *device_images* table which stores all necessary
+information for each device image.
+
+The dcmanager DB will get a number of new tables, (analogous to the ones used
+for software patching) which will track the strategy data for the device image
+update at the distributed-cloud SystemController level.
+
+REST API impact
+---------------
+
+TBD
+
+Security impact
+---------------
+
+The low-level implementation of the sysinv FPGA agent assumes the use of
+privileged containers to handle the actual low-level interaction with the
+physical hardware.  We currently allow privileged containers, but we may want
+to lock things down further in the future.  In that case we might need to
+install the OPAE tools as part of StarlingX rather than in a container.
+
+This change does not directly deal with sensitive data.  It deals with
+bitstreams which may represent sensitive data, but the bitstreams have already
+been signed before they're provided to StarlingX.
+
+The biggest security impact would be an admin-user impact, since once an N3000
+device has had its root key programmed it cannot be changed short of sending
+the device back to the factory.
+
+Other end user impact
+---------------------
+
+When a device image is being updated, it's very likely that a hardware reset
+will be required, either of that specific device or of the whole host.  This
+will necessarily cause a service outage on the device in question, as well as
+for any application containers making use of the device.
+
+Performance Impact
+------------------
+
+The new code is not expected to be called frequently.  It is expected to be
+called more often during the initial phases of a customer network build-out as
+FPGA images are reworked to deal with teething issues.
+
+There will be a periodic audit in dcmanager to check whether each subcloud is
+up-to-date on its hardware device images.  We will generally only trigger
+device updates during a maintenance window, so this audit does not need to be
+frequent.
+
+The API changes have been designed to minimize the number of calls required
+to perform this audit, since they involve communicating between the
+SystemController and the subclouds which may be geographically remote.
+
+While performing firmware updates, the host in question will not be able to
+lock unless the admin forces the operation.
+
+Other deployer impact
+---------------------
+
+None.
+
+Developer impact
+----------------
+
+Nothing different than any other development in these areas of the code.
+
+Upgrade impact
+--------------
+
+All changes will be made in such a way as to support upgrades from the previous
+version.
+
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+
+Primary assignee:
+  Chris Friesen
+
+Other contributors:
+  Al Bailey
+  Eric MacDonald
+  Teresa Ho
+
+Repos Impacted
+--------------
+
+`<https://opendev.org/starlingx/config>`_
+
+`<https://opendev.org/starlingx/distcloud.git>`_
+
+`<https://opendev.org/starlingx/nfv.git>`_
+
+Work Items
+----------
+
+* Sysinv FPGA Agent changes.
+* Sysinv-api and Sysinv-conductor changes for triggering device image update.
+* Sysinv-api and Sysinv-conductor changes for managing device images.
+* VIM work for orchestrating firmware update at the subcloud level.
+* dcmanager work for orchestrating firmware update at the SystemController
+  level.
+* sysinv-api-proxy work for proxying the device image management API to
+  RegionOne and saving the device images in the vault for use by dcmanager.
+
+
+
+Dependencies
+============
+
+None
+
+
+Testing
+=======
+
+Unit tests will be added for new/modified code and will be executed by tox,
+which is already supported for dcmanager, sysinv and VIM.
+
+The expectation is that there will be 3rd party testing by Wind River with
+actual hardware to ensure that this works as expected.
+
+
+Documentation Impact
+====================
+
+The Cloud Platform Node Management guide, Cloud Platform Administration
+Tutorials, and Cloud Platform User Tutorials will likely need to be updated.
+
+The VIM, dcmanager, and sysinv API documentation will be updated with new APIs
+
+References
+==========
+
+.. [1] https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/intel-fpga-pac-n3000/overview.html
+.. [2] https://opae.github.io/latest/index.html
+
+
+History
+=======
+
+.. list-table:: Revisions
+   :header-rows: 1
+
+   * - Release Name
+     - Description
+   * - stx-4.0
+     - Introduced