Merge "N3000 FPGA device image update orchestration"

This commit is contained in:
Zuul 2020-04-18 23:04:08 +00:00 committed by Gerrit Code Review
commit 115a568cec
1 changed files with 845 additions and 0 deletions

View File

@ -0,0 +1,845 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License. http://creativecommons.org/licenses/by/3.0/legalcode
=========================================
N3000 FPGA Bitstream Update Orchestration
=========================================
Storyboard:
https://storyboard.openstack.org/#!/story/2006740
The overall scenario is that we have an administrator operating in a central
cloud, with hundreds or thousands of subclouds being managed from the central
cloud. In each subcloud there will be one or more nodes with FPGA devices.
These devices will need to be programmed with a number of types of bitstreams
but to ensure that service standards are met they can't all be updated at the
same time. Instead, the admin will create policies which govern which subclouds
are updated when, and the orchestration framework will follow those policies to
update the various subclouds.
Problem description
===================
In a distributed-cloud environment there may be hundreds or thousands of
subclouds, each containing one or more hosts, some of which may have hardware
devices on them (like NICs or FPGAs) which require image updates in order to
properly provide service to applications which ultimately provide services for
the end-user.
In order to simplify management of these hardware devices, we wish to support
orchestration of device image updates in a distributed-cloud environment,
starting with the Intel N3000 FPGA device (which is expected to be
commonly-used for 5G) but designing the framework in such a way that we could
extend it to deal with other types of device images (other FPGAs, or NIC
firmware for example) as well.
For the case of the N3000 (and likely other FPGAs) there are a number of
different image types that need to be supported, specifically one to set the
root authentication key, one to update the FPGA core (signed with a signing
key), and one to revoke a signing key. For the case of NICs, you'd typically
have a single image type. In all cases, the image type would only be valid for
a specific PCI vendor/device tuple.
Since updating device firmware will necessarily result in a service outage, we
need the ability to control which subclouds (which typically would correspond
to geographic areas) can be updated in parallel.
Use Cases
---------
As a cloud admin, I want to push out a hardware device image update to
hardware devices on a single host (possibly for test purposes).
As a cloud admin, I want to push out hardware device image updates to hardware
devices on multiple hosts in a cloud.
As a distributed-cloud admin, I want to push out hardware device image updates
to hardware devices on all hosts on a single subcloud (possibly for test
purposes).
As a distributed-cloud admin, I want to push out hardware device image updates
to hardware devices on multiple hosts on many subclouds. While doing this, I
want to control which hosts and which subclouds can be updated in parallel
since I want to avoid causing service outages while doing the update.
As a distributed-cloud admin, I want to be able to display whether each
subcloud is using up-to-date device images.
As a distributed-cloud admin, I want to see the status of in-progress device
image updates.
As a distributed-cloud or cloud admin, I want to be able to abort an
orchestrated device image update such that currently-in-progress device writes
will finish but no additional ones will be scheduled.
Proposed change
===============
The overall architecture of the device image orchestration will be modelled
after the existing software-patch handling. In a single-cloud environment we
will support uploading device images, "applying" them (which just means marking
them as something that should get written to the hardware), and then actually
kicking off the write to the hardware.
In an distributed-cloud environment, when using device image orchestration we
will first do the above in the SystemController region, and then use dcmanager
to handle pushing the images down to the subcloud and kicking off the actual
update in the subcloud. The VIM in each subcloud will decide when to update
the device images on each host, and a sysinv agent on each host will handle
writing the actual device images to the hardware.
In a distributed-cloud environment it will also be possible for the admin user
to explicitly issue commands to the sysinv API endpoint for a single subcloud.
This will essentially bypass the orchestration mechanism and behave the same
as the single-cloud environment.
Hardware Background
-------------------
The initial hardware that we want to support is the Intel N3000 [1]_, an FPGA that
we expect will be used by 5G edge providers. This FPGA is a little unique in
that it takes ~40min to write the functional image to the hardware (service
can continue during this time, then a hardware reset is required to load the
new image). Once the new image is loaded, the device will provide multiple
VFs, which in turn will be exported to Kubernets as resources, where they will
be consumed by applications running in Kubernetes containers. Because of the
long write times, these devices must be pre-programmed rather than programmed
at Kubernetes pod startup.
Hardware Security Model
-----------------------
By default, the N3000 will accept any valid bitstream that is sent to it (signed
or unsigned). The customer/vendor can generate a public/private root key pair,
then create a *root-entry-hash* bitstream from the public key. If a
*root-entry-hash* bitstream is written to the N3000 it will set the root entry
hash on the device. From that point on, only bitstreams signed by a code
signing key (CSK) which is in turn signed by the private root key will be
accepted by the N3000. Once a *root-entry-hash* bitstream has been written to
the hardware, it cannot be erased or changed without sending the hardware back
to the vendor.
The customer/vendor can generate new *FPGA user* bitstreams. These may be
unsigned or signed with a CSK. Typically each such bitstream would be signed
by a different CSK. Writing a new *user* bitstream will cause the new code to be
loaded on the next bootup of the N3000. Only one *user* bitstream can be stored
in the N3000 at a time.
The customer/vendor can create a *CSK-ID-cancellation* bitstream (generated from
the private root key). When written to the N3000 it will revoke a
previously-used CSK and disallow loading any images signed with it. Multiple
*CSK-ID-cancellation* bitstreams can be processed for each N3000. Most
importantly, StarlingX will not deal directly with CSKs, only bitstreams.
Cloud/Subcloud Sysinv FPGA Agent
--------------------------------
The low-level interactions with the physical FPGA devices will be performed by
a new *sysinv FPGA agent* which will reside on each node with a *worker*
subfunction. The agent will communicate bi-directionally with sysinv-conductor
via RPC. The interactions with the N3000 FPGA will be performed using the OPAE
tools [2]_ in the n3000-opae Docker image running in a container. (This will
require the use of privileged containers due to the need to directly access
hardware devices.)
On startup, the existing *sysinv-agent* will try to create a file under
/run/sysinv. If the file did not yet exist, it will send an RPC message to
sysinv-conductor indicating that the host just rebooted. Sysinv-conductor will
then clear any "reboot needed" DB entry for that host if it was set. If there
are no more "pending firmware update" entries in the DB for any host, and if no
host has the "reboot needed" DB entry set, then the "*firmware update in
progress*" alarm will be cleared.
On startup, the existing *sysinv-agent* will do an inventory of the PCI
devices on each worker node. The new *sysinv-fpga-agent* will inventory the
FPGA devices as well, including querying additional details from each FPGA
device as per the *host-device-show* command. The FPGA agent will send an RPC
message to *sysinv-conductor* to update the database with up-to-date FPGA device
information.
If there are problems that need to be dealt with immediately (such as the FPGA
booting the factory image when there should be a functional image) then
*sysinv-conductor* will send an RPC message to *sysinv-fpga-agent* to trigger
a *device-image-update* operation to ensure that the FPGA is up-to-date. This
will also cause an alarm to be raised.
If the FPGA has a valid functional image but it's not the currently-active
functional image, then we will alarm it but not trigger a *device-image-update*
operation. In the future we may wish to extend this to check whether the
functional image was signed with a cancelled CSK-ID and if so then trigger a
*device-image-update* operation due to security risks.
On sysinv-conductor startup it will send out a request to all *sysinv FPGA
agents* to report their hardware status. This is needed to deal with certain
error scenarios.
In certain error scenarios it's possible that the *sysinv FPGA agent* will be
unable to send a message to sysinv-conductor. It will need to handle this
gracefully.
Subcloud Sysinv Operations
--------------------------
At the single-cloud or subcloud level, the commands start out fairly typical.
We plan to extend sysinv to introduce create/list/show/delete commands for the
FPGA images, extend the existing *host device* commands to operate on the FPGA
device, add new commands to *apply* or *remove* a device image, and finally add
new commands to initiate or abort the firmware update.
The concept of *apply* is used because there are different types of bitstreams
and it's possible to have more than one bitstream that needs to be downloaded
to a newly-added FPGA. This will be discussed in more detail in the
activation section below.
system device-image-create
^^^^^^^^^^^^^^^^^^^^^^^^^^
Define a new image, specifying bitstream file, the bitstream type (root-key,
functional, or key-revocation), an optional name/version/description, the
applicable PCI vendor/device the image is for, and various
bitstream-type-specific metadata such as the bitstream ID (for the
FPGA functional image), the key signature (for the root-key image), the key ID
being revoked, etc. To simplify the dcmanager code, this should allow
specifying the UUID for the image. (Ideally we should be able to issue a GET
in RegionOne and pass the results directly to a PUT to the same location in
the subloud to create a new image in the subcloud. Alternatively, a POST could
be used but we'd have to add the UUID to the request body.) If not specified,
the system will create a UUID for the image, the bitstream file will be stored
in a replicated filesystem on the controller node, and the metadata will be
stored in the sysinv database.
system device-image-list
^^^^^^^^^^^^^^^^^^^^^^^^
Display high-level image data for all known images. This would include image
type (root-key, functional, key-revocation), UUID, version.
system device-image-show
^^^^^^^^^^^^^^^^^^^^^^^^
Display detailed image data for a single image (specified via UUID). This
would include UUID, image type, description, key ID, bitstream ID, name,
description, signing key signature, any activations (with device label) for the
image, etc.
system device-image-delete
^^^^^^^^^^^^^^^^^^^^^^^^^^
Delete an image (specified by UUID). If an FPGA functional image is deleted
due to a security issue, it would be wise to also upload and activate a
key-revocation bitstream to prevent the image from being uploaded again either
by accident or maliciously.
system device-image-apply
^^^^^^^^^^^^^^^^^^^^^^^^^
Make the specified image *active*, but do not actually initiate writing to the
hardware. This applies to a specific image, and optionally takes a device
label key/value such that only devices with the specified label would be
updated. Initially only *functional*, *root-key*, and *key-revocation*
bitstreams are supported. Only one *root-key* bitstream can ever be written to
an N3000, so having more than one such bitstream be active doesn't make sense.
Applying a *functional* bitstream will *remove* all other functional bitstreams
for that FPGA PCI vendor/device. There can be multiple *key-revocation*
bitstreams active.
Note that it would be possible to make multiple images active, then issue a
*host-device-image-update* command to trigger writing them all to the hardware.
When an image has been applied, a "device firmware update in progress" alarm
will be raised, and will stay raised until all affected devices have had their
firmware updated or until the device image is removed. This implies that a
"pending firmware update" DB entry will be created for each affected device for
each applied image to indicate that the image needs to be written to the
device.
system device-image-remove
^^^^^^^^^^^^^^^^^^^^^^^^^^
Deactivate the specified image, optionally allowing specifying a device label
to deactivate the image only for devices with the specified label. If you try
to deactivate an image which is currently being written to the hardware it will
succeed but will not abort the write.
When an image is deleted, all of its activation records will also be deleted.
(The implementation of this operation could probably be left to the end as it
is not critical.)
Removing an image will remove any "pending firmware update" DB entries for that
image. If there are no remaining pending firmware updates, and no "reboot
needed" DB entries for any host, then the "device firmware update in progress"
alarm can be cleared.
system host-device-image-update
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Tell sysinv to update the specified device on the specified host with any
active images which have not yet been written to the hardware. In this
scenario, sysinv-conductor will tell the FPGA agent to write each
active-but-not-written image to the device in turn until they've all been
written. We would want to write the root-key bitstream first, then any
key-revocation bitstreams, then the functional bitstream. If we have
successfully written the functional bitstream, the admin user (or the VIM
in the orchestrated update case) will need to lock/unlock the node to cause
the new functional image to be loaded.
While writing an image to the FPGA, we would want to block the reboot of the
host in question. We will only allow updating device images on unlocked hosts,
and once the device image starts no host-lock commands will be accepted
unless the *force* option is used. While the FPGA agent is writing the image
to the hardware, it will *stop* the watchdog service from running, since we
don't want an unrelated process to trigger a reboot while we're writing to the
hardware. After the image has been written, the FPGA agent will restart the
watchdog service.
After each image is written the FPGA agent would send an RPC message to
sysinv-conductor to remove the "*pending firmware update*" entry from the DB
and to set a "reboot needed" DB entry for that host.
system host-device-image-update-abort
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Abort any pending image updates for this host. Any in-progress device image
updates will continue until completion or failure.
(The implementation of this operation could be left towards the end, as it is
not necessary for the success path.)
system host-device-list
^^^^^^^^^^^^^^^^^^^^^^^
Add support to the existing command so the FPGA device displays in the list.
Add a new "needs firmware update" column.
system host-device-show
^^^^^^^^^^^^^^^^^^^^^^^
Extend the existing command to add new optional device-specific fields. For the
N3000 this would include accelerator status, type of booted image
(user/factory), booted image bitstream ID, cancelled CSK IDs, root entry hash,
BMC versions, PCI device ID, onboard NIC devices, etc.
We might want to include device labels (see below) in the output.
System host-device-label-assign
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Assign a *key: value* label to a PCI device. This takes as arguments the PCI
device, the host, the key, and the value.
System host-device-label-list
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
List all labels for a given PCI device. This takes as arguments the PCI device
and the host, and returns a list of all key/value labels for the device.
(Alternatively could take the PCI device UUID, but the CLI doesn't expose that
currently.)
System host-device-label-remove
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Remove a label from a PCI device. This takes as arguments the PCI device, the
host, and the key.
System device-label-list
^^^^^^^^^^^^^^^^^^^^^^^^
List all devices and their labels from all hosts in the system. Devices
without any labels are not included. This is intended for use by dcmanager to
determine whether an image should be created in a given subcloud.
system host lock/swact
^^^^^^^^^^^^^^^^^^^^^^
The *lock* operation would be blocked by default during device image update to
prevent accidentally rebooting while in the middle of updating the FPGA image
(since that would result in a service outage while the FPGA gets updated
again). Since we will only start a device image update on an unlocked host,
this should be sufficient.
If the *force* option is specified for this command, the action will
proceed. (This may mean that the device ends up in a bad state if the host
reboots while a device image update was in progress.)
The manual swact operation will be blocked during a device image update to
reduce the chances that it will interfere with the image update. The image
update code in the rest of the system will try to deal with temporary outages
caused by a swact, but we may need to handle it as a failure if the outage
lasts long enough.
Subcloud VIM Operations
-----------------------
All of these operations would be analogous to the existing sw-manager
patch-strategy and update-strategy operations. We're using *firmware update*
in the CLI to allow it to be potentially more generic in the future, but
initially these would apply to the FPGA image update only.
The VIM will control the overall firmware update strategy for the subcloud. It
will decide whether a firmware update is currently allowed to be kicked off (if
there are alarms raised it might block the firmware update strategy apply
depending on the strategy), control how many hosts can do a firmware update in
parallel, trigger each host to begin the firmware update, and aggregate the
status of the firmware update on the various hosts.
When the VIM decides to initiate a firmware update on a given host, it will
issue the HTTP equivalent of the *system host-device-image-update* command to
sysinv on that host to tell that host to write all *applied but not yet
written* device images to the hardware.
sw-manager fw-update-strategy create
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Check the system state and builds up a sequence of commands needed to bring the
subcloud into alignment with the desired state of the system. This would take
options such as how many hosts can do firmware update in parallel, whether to
stop on failure, whether outstanding alarms should prevent upgrade, etc.
This step will loop over all hosts querying sysinv to see whether each host has
any devices that need updating, then generate a series of steps to bring all
relevant hosts in the subcloud up-to-date for their device images.
If there are no firmware updates to be applied, the strategy creation will
fail with the reason "no firmware updates found".
sw-manager fw-update-strategy apply
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Execute the firmware update strategy. Probably want an option similar to the
*stage-id* supported for applying a patching strategy up to a specific stage ID.
Apply the specified firmware update strategy to each host specified in the
strategy (this would typically be all hosts which have devices which need a
firmware update) following the policies in the strategy around serial/parallel
updates. For each affected host, the VIM will use the sysinv REST API to
trigger a *system host-device-image-update* operation and then periodically
check for the status of the update operation.
sw-manager fw-update-strategy show
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Display update strategy, optionally with more details (like current status of
the overall sequence as stored in the VIM database).
sw-manager fw-update-strategy abort
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Allow existing in-progress FPGA updates to complete, do not trigger any
additional nodes to begin FPGA updates. Signal to sysinv to abort FPGA
update; this will still allow in-progress FPGA updates to complete since we do
not want to end up with a half-written image (which would require a new FPGA
update operation to recover).
(The implementation of this may be left till the end as it is not needed for
the success path.)
sw-manager fw-update-strategy delete
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Delete the strategy once no longer needed.
System Controller DC Manager Operations
---------------------------------------
The DC Manager operations in the system controller are strongly related to the
VIM operations in the subcloud, and most of them are equivalent to the
operations for the existing sw-manager patch-strategy and update-strategy
operations.
The DC manager will control when to trigger a firmware update in a given
subcloud, which subclouds can be updated in parallel, and whether to stop on
failure or not.
The DC manager will also handle creating/deleting device images in each
subcloud as needed to keep the subcloud in sync with the SystemController
region by talking to sysinv-api-proxy in the SystemController region and in
each subcloud. The actual device image files will be stored by the
sysinv-api-proxy in a well-known location where DC manager can access them when
creating device images in the subclouds. DC Manager will only create device
images in the subcloud if there is at least one device in the subcloud which
will be updated with the device image in question (based on any labels
specified via the "*system device-image-apply*" command).
As part of the dcmanager, there will be a periodic audit which scans a number
of subclouds in parallel and checks whether the subcloud has all of the
*applied* device images that it should have (based on the labels the images
were applied against and the device labels in the subcloud), and whether all of
the *applied* device images have been written to the devices that they
should be. If either of these is not true, then the subcloud "*firmware image
sync status*" is considered "*out of sync*". This will result in the subcloud
as a whole being considered "*out of sync*".
When a device image is *applied* in sysinv, dcmanager will be notified and will
set the "*firmware image sync status*" to *unknown* for all subclouds
since it does not know at this point which subcloud(s) the image needs to be
created/applied/updated in. On the next audit, this sync status will be
updated to "*in sync*" or "*out of sync*" as applicable.
dcmanager subcloud-group create/add-member/remove-member/delete/list/show
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
These introduce the concept of a "*subcloud group*", which is a way of
grouping subclouds together such that all subclouds in the group can potentially
be upgraded in parallel. A given subcloud can only be a member of one subcloud
group.
This is needed because the customer will likely want to ensure (as much as
posible) that we don't update the functional image (which requires a service
outage) on all subclouds that serve a certain geographic area (which could
cause an outage for end-users in that area).
There will be controls over how many subclouds in a group can be updated at
once. Dcmanager will only apply update strategies in one group at a time,
and will update all subclouds in a group before moving on to the next subcloud
group.
dcmanager fw-update-strategy create
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Create a new update strategy, with options for number of subclouds to upgrade
in parallel, whether to stop on failure, etc. (Eventually we may want to
specify a list of which subcloud groups to update but this will not be
included in the initial version.) This will generate a UUID for the created
strategy, and will generate a step for each subcloud that dcmanager thinks is
out-of-sync that is in the specified subcloud group. If there are any subclouds
with an "unknown" sync state in the subcloud group then we would disallow
the creation of a firmware update strategy for that group.
dcmanager fw-update-strategy list
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
List the firmware update strategies, with the most important bits of
information for each. This should include the overall update strategy status
(i.e. "in progress" if we've asked for it to be applied).
dcmanager fw-update-strategy show
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Show specified firmware update strategy. This would include all the metadata
specified as part of the "create", the overall update strategy status, as
well as the status (as reported by the subcloud VIM) of the firmware update
strategy application for all the subclouds in the specified subcloud group.
dcmanager fw-update-strategy apply
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Execute each step (where each step roughly corresponds to a subcloud) of the
specified firmware update strategy. This would look something like this:
* Queries sysinv in RegionOne for active FPGA images using REST API.
* For each strategy step, use sysinv REST API to:
* Query subcloud for device labels.
* Query subcloud for FPGA images.
* Create/update/delete FPGA images in subcloud as needed to bring it into
sync with the FPGA images in the SystemController. We don't do this via
dcorch because we want to ensure the data is up to date when applying the
update strategy. (This process could take some time on slow subcloud link.)
* Apply the device image in the subcloud.
* Create FPGA update strategy using VIM REST API.
* Apply FPGA update strategy using VIM REST API.
* Monitor progress by querying FPGA update strategy using VIM REST API.
dcmanager fw-update-strategy abort
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Pass the abort down to each subcloud, and do not process any more subclouds.
(Maybe leave this till last as it is not needed for the success path.)
dcmanager fw-update-strategy delete
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Delete firmware update strategy. This would also delete the firmware update
strategy in the subcloud using the VIM REST API. It is not valid to delete an
in-progress update strategy.
dcmanager strategy-step list
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Extend if needed to list the strategy-steps and their state for the FPGA update
strategy that is being applied.
dcmanager strategy-step show
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Extend if needed to show step for a specific subcloud.
System Controller Sysinv Operations
-----------------------------------
The sysinv operations in the system controller essentially duplicate the
image-related subset of the sysinv operations in the subcloud. We don't expect
the system controller to have any FPGAs, so the device-image-update,
host-device-list, and host-device-show commands are not relevant. In all cases
the request is intercepted by sysinv-api-proxy in the SystemController region
and forwarded to RegionOne. Unlike normal resources, dcorch will not be used
to synchronize the FPGA image information.
system device-image-create
^^^^^^^^^^^^^^^^^^^^^^^^^^
Define a new image, as described in `Subcloud Sysinv Operations`_. If
successful, the sysinv API proxy will also save the image to
/opt/device-image-vault, which will be a drbd-replicated filesystem analogous
to how /opt/patch-vault is used to store patch files for orchestrated patching.
system device-image-list
^^^^^^^^^^^^^^^^^^^^^^^^
Display high-level image data for all known images as per
`Subcloud Sysinv Operations`_.
system device-image-show
^^^^^^^^^^^^^^^^^^^^^^^^
Display detailed image data for a single image as per
`Subcloud Sysinv Operations`_.
system device-image-delete
^^^^^^^^^^^^^^^^^^^^^^^^^^
Delete an image as per `Subcloud Sysinv Operations`_. Remind the user about
uploading and activating a key-revocation bitstream for security issues
when deleting a functional image. Deleting an active image will not be
allowed.
system device-image-apply
^^^^^^^^^^^^^^^^^^^^^^^^^
Make the specified image *active* as per `Subcloud Sysinv Operations`_.
system device-image-remove
^^^^^^^^^^^^^^^^^^^^^^^^^^
Make the specified image *inactive* as per `Subcloud Sysinv Operations`_.
Fault Handling
--------------
While a device is in the middle of updating its functional image, it's
possible that a fault could occur that would normally result in the host being
rebooted. If we reboot while updating the N3000 functional image it could
result in a 40-minute outage on host startup while we flash the functional
image again.
Given the above, the desired behavior while a device image update is in
progress is to avoid rebooting on faults (critical process alarm, low memory
alarm, etc.) as long as the fault is not something (like high temperature) that
could actually damage the hardware.
This is less of an issue for AIO-SX since we're already suppressing mtce reboot
actions.
The host watchdog will currently reset the host under certain circumstances.
This is undesirable if we're in the middle of updating device images, so the
sysinv FPGA agent will temporarily shut down the "hostwd" daemon during device
image update and start it back up again after. (Later on we may want to
modify it to stay running but emit logs instead of actually resetting the
host.)
CLI Clients
-----------
We will extend the existing *system*, *sw-manager*, and *dcmanager* clients to
add the new commands and extend the existing commands where applicable.
Specifically for the case of system host-device-show the expectation is that
the new FPGA-specific fields will only be returned by the server for FPGA
devices. The client will need to be able to handle the variable set of fields
rather than assuming a constant set of fields.
Web GUI
-------
If we want to allow this to be handled entirely through the GUI we'd need to
add support for all the system controller operations from sysinv and dcmanager.
This will not be implemented in the initial release.
Alternatives
------------
Given our existing infrastructure, there aren't too many alternatives. We
could extend the existing *sysinv-agent* instead of making a new FPGA-specific
agent, but there's going to be a fair bit of hardware-specific code in the new
agent so that might not make sense.
The VIM and dcmanager changes closely align with how we already support
software patching and software upgrade, so this enables maximum code re-use.
Sysinv already talks to the hardware and deals with PCI devices, as well as
controlling the lock/unlock/reboot operations, so it's the logical place to
handle the interactions between those operations and the device image updates.
Data model impact
-----------------
The dcmanager DB will have a new "subcloud_group" table which maps subclouds
into groups. Subclouds within a group can be updated in parallel, while
subclouds from different groups cannot.
The sysinv DB will have a new *fpga_devices* table which will include new
fields that are specific to the FPGA devices. Each row will be associated
with a row in the *pci_devices* table.
The sysinv DB *pci_devices* table will get a new "needs firmware update"
column.
The sysinv DB will get a new *device_images* table which stores all necessary
information for each device image.
The dcmanager DB will get a number of new tables, (analogous to the ones used
for software patching) which will track the strategy data for the device image
update at the distributed-cloud SystemController level.
REST API impact
---------------
TBD
Security impact
---------------
The low-level implementation of the sysinv FPGA agent assumes the use of
privileged containers to handle the actual low-level interaction with the
physical hardware. We currently allow privileged containers, but we may want
to lock things down further in the future. In that case we might need to
install the OPAE tools as part of StarlingX rather than in a container.
This change does not directly deal with sensitive data. It deals with
bitstreams which may represent sensitive data, but the bitstreams have already
been signed before they're provided to StarlingX.
The biggest security impact would be an admin-user impact, since once an N3000
device has had its root key programmed it cannot be changed short of sending
the device back to the factory.
Other end user impact
---------------------
When a device image is being updated, it's very likely that a hardware reset
will be required, either of that specific device or of the whole host. This
will necessarily cause a service outage on the device in question, as well as
for any application containers making use of the device.
Performance Impact
------------------
The new code is not expected to be called frequently. It is expected to be
called more often during the initial phases of a customer network build-out as
FPGA images are reworked to deal with teething issues.
There will be a periodic audit in dcmanager to check whether each subcloud is
up-to-date on its hardware device images. We will generally only trigger
device updates during a maintenance window, so this audit does not need to be
frequent.
The API changes have been designed to minimize the number of calls required
to perform this audit, since they involve communicating between the
SystemController and the subclouds which may be geographically remote.
While performing firmware updates, the host in question will not be able to
lock unless the admin forces the operation.
Other deployer impact
---------------------
None.
Developer impact
----------------
Nothing different than any other development in these areas of the code.
Upgrade impact
--------------
All changes will be made in such a way as to support upgrades from the previous
version.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Chris Friesen
Other contributors:
Al Bailey
Eric MacDonald
Teresa Ho
Repos Impacted
--------------
`<https://opendev.org/starlingx/config>`_
`<https://opendev.org/starlingx/distcloud.git>`_
`<https://opendev.org/starlingx/nfv.git>`_
Work Items
----------
* Sysinv FPGA Agent changes.
* Sysinv-api and Sysinv-conductor changes for triggering device image update.
* Sysinv-api and Sysinv-conductor changes for managing device images.
* VIM work for orchestrating firmware update at the subcloud level.
* dcmanager work for orchestrating firmware update at the SystemController
level.
* sysinv-api-proxy work for proxying the device image management API to
RegionOne and saving the device images in the vault for use by dcmanager.
Dependencies
============
None
Testing
=======
Unit tests will be added for new/modified code and will be executed by tox,
which is already supported for dcmanager, sysinv and VIM.
The expectation is that there will be 3rd party testing by Wind River with
actual hardware to ensure that this works as expected.
Documentation Impact
====================
The Cloud Platform Node Management guide, Cloud Platform Administration
Tutorials, and Cloud Platform User Tutorials will likely need to be updated.
The VIM, dcmanager, and sysinv API documentation will be updated with new APIs
References
==========
.. [1] https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/intel-fpga-pac-n3000/overview.html
.. [2] https://opae.github.io/latest/index.html
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - stx-4.0
- Introduced