Wireless FEC Opeator application for StarlingX

This specification describe Intel Wireless
FEC operator application for StarlingX.

Story: 2009749
Task: 44206

Change-Id: Ie84b97f81d5ae21bc2fcf1f57a8298b923a65bf8
This commit is contained in:
Balendu Mouli Burla 2022-04-11 17:04:24 -05:00
parent dfaeb38ab7
commit ac580e3db7
1 changed files with 375 additions and 0 deletions

View File

@ -0,0 +1,375 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License. http://creativecommons.org/licenses/by/3.0/legalcode
=================================
Wireless FEC Operator integration
=================================
Integration of Intel Wireless FEC Operator to StarlingX platform
================================================================
Storyboard:
https://storyboard.openstack.org/#!/story/2009749
In a distributed cloud environment for vRAN workloads, there may be hundreds
of sub-clouds, with each sub-cloud having one or more worker nodes managed
by a System Controller, Some of these sub-clouds have worker nodes with
Intel accelerator devices to offload 4G/LTE and 5G FEC (Forward Error
Correction) operations.
These FEC devices have the flexibility to configure the hardware resource
on a per vRAN workload basis to gain the optimal performance. In a typical
scenario based on deployment locations, individual vRAN workload requirements
may vary.
For an admin to manage and/or configure these Intel FEC accelerated devices
in a containerized environment, additional functionality is required. The
current configurability method in StartlingX does not support the flexibility
to configure all the parameters in FEC h/w accelerator and has a
pre-defined/static configuration options for typical workloads.
Problem description
===================
Today in StarlingX, configuration of FEC devices is performed through a user
application "pf-bb-config", which in turn statically sets configurable
parameters through a config file. Current version of StarlingX does support
the configuration a few parameters (only 1 or 2) of FEC devices through
"system" commands which in turn triggers puppet to "pf-bb-config" application
when the system is unlocked.
Current configurability option uses pre-defined/static config files to
configure FEC devices to support most the common vRAN workload requirements.
To support other combinations of configurations and changing the configuration
on different nodes in a cluster requires to add and maintain this configuration
file in a somewhat unsupported fashion.
In addition to that, the next generation FEC devices ie., ACC101, ACC200, ...
support may need enhancements to the existing configuration method.
The Intel supported FEC Operator is a SRO (Special Resource Operator)
for K8s which performs:
* detects and labels the nodes which have FEC h/w accelerators installed
* Configuration of FEC devices through standard K8s APIs (in JSON format)
* Validation of FEC device configuration parameters
* Configuration can be applied at cluster level or node level and device level
* deployable through Kustomize/Helm deployment models
* Support for next generation FEC devices is seamless
Use Cases
---------
FEC Operator is an optional system application for the vRAN deployments
where there is a need for fine tuning Intel FEC h/w accelerator resources
(ie., number of VFs, queues, queue groups, etc..) based on deployment workloads.
List of parameters that can be configurable through the FEC Operator are:
* Number of VF interface (VF bundles)
* PF/VF mode
* Enabling 4Gonly, 5Gonly or both 4G and 5G
* for each direction (uplink/downlink) configuration of:
* number of queuegroups, aqsPerGroup and aqDepth
User has the flexibility to apply these configuration per devices per node
in a cluster using the native kubectl API interface.
Proposed change
===============
The current method of configuration of FEC devices will be the default
configuration for existing vRAN deployments that will not be changed.
FEC Operator will be added as an optional System application
(sriov-fec-operator), which by default will be disabled (i.e. not applied or
uploaded). Deployment of FEC operator is through helm charts packaged in the
new system application manifest. Users on demand, can enable, deploy and
configure the FEC operator by updating and applying helm overrides for the
new system application.
FEC Operator functionality is distributed in few PODs:
* sriov-fec-controller-manager
* Runs on all master nodes in cluster, provides K8s Custom Resource
API services for FEC device configuration,
* communicates with FEC operator service running on each node
to configure the FEC devices and reconciling.
* sriov-fec-daemonset
* Runs on each node in cluster,receives configuration from
controller-manager
* Detects the FEC devices on the platform/node
* Based on data configured in SriovFecClusterConfig CRD
* Binds the PF (Physical Function) interface with required driver
ie., igb_uio or pci-pf-stub.
* Creates the required number of VF interfaces
* Bind the VF interface with driver (igb_uio, vfio-pci)
* configure the FEC device using the pf-bb-config tool
* sriov-device-plugin
* Runs on each node, to manage the FEC device SR-IOV VF (Virtual Function)
resources configured to user application PODs.
* accelerator-discovery
* Runs on each node to detect the FEC devices on each node
* label the nodes which have FEC device
With the two different methods of FEC device configuration,
method-1: Default, existing method
method-2: using FEC Operator
Method-1(existing method) is the default method applied on node startup.
If SriovFecClusterConfig CRD is applied then sriov-fec-daemonset on the
node will overwrite the existing configuration for that particular device
on the node.
If admin want switch back to default static method, then performs the
SriovFecClusterConfig CRD delete operation and reconfigure the device
through method-1.
NOTE:
Reconfiguration and/or switching between configuration methods will
impact the FEC device usage for the vRAN application PODs. Below listed
steps recommended to follow during reconfiguration and/or switching
configuration methods.
- vRAN Application PODs should stop using the FEC devices and terminated.
- Perform reconfiguration of device or switch the method and reconfigure.
- Redeploy the vRAN application PODs to use the FEC device.
FEC devices supported through FEC Operator in STX 7 are:
ACC100(Mt.Bryce), N3000 FPGA
Alternatives
------------
The current method of configuration to FEC devices is the default method of
configuration and enabled by-default.
Configuration through FEC Operator is an optional alternative method.
Data model impact
-----------------
Sriov-fec-operator application is introducing the new
SriovFecClusterConfig CRD to the cluster.
Sample Cluster configuration:
-----------------------------
.. code-block:: none
apiVersion: sriovfec.intel.com/v2
kind: SriovFecClusterConfig
metadata:
name: config
namespace: sriov-fec-system
spec:
priority: 1
nodeSelector:
kubernetes.io/hostname: <node-label>
acceleratorSelector:
pciAddress: 00000:17:00.0
physicalFunction:
pfDriver: "pci-pf-stub"
vfDriver: "vfio-pci"
vfAmount: 16
bbDevConfig:
acc100:
# Programming mode: 0 = VF Programming, 1 = PF Programming
pfMode: false
numVfBundles: 16
maxQueueSize: 1024
uplink4G:
numQueueGroups: 0
numAqsPerGroups: 16
aqDepthLog2: 4
downlink4G:
numQueueGroups: 0
numAqsPerGroups: 16
aqDepthLog2: 4
uplink5G:
numQueueGroups: 4
numAqsPerGroups: 16
aqDepthLog2: 4
downlink5G:
numQueueGroups: 4
numAqsPerGroups: 16
aqDepthLog2: 4
sriov_fec_cluster_config parameters description:
------------------------------------------------
* ``name``: Name of the specific config.
* ``cluster_config_name``: Name of the cluster config.
* ``priority``: Priority of deployment (lower number higher priority).
* ``drainskip``: Allows for skipping the draining of the node after
config application.
* ``selected_node``: (Optional) field that can be used to target only
specific node.
* ``pf_driver``: The PF driver to be used igb_uio or pci-pf-stub.
* ``vf_driver``: The VF driver to be used vfio-pci or igb_uio.
* ``vf_amount``: The amount of VFs to be created for the device.
* ``bbdevconfig``:
* ``pf_mode``: The mode in which accelerator will be programmed,
it is expected that VFs will be used and this is set to false.
* ``num_vf_bundles``: Number of VF bundles this should correspond
to the vf_amount field.
* ``max_queue_size``: Max queue size this field is not expected to
change in most deployments.
* ``ul4g_num_queue_groups``: Number of 4G Uplink queue groups,
there is in total 8 queue groups that can be distributed between
4G/5G Uplink/Downlink.
* ``ul4g_num_aqs_per_groups``: Number of aqs per group - not expected
to change for most deployments.
* ``ul4g_aq_depth_log2``: Log depth
* ``dl4g_num_queue_groups``: Number of 4G Downlink queue groups,
there is in total 8 queue groups that can be distributed between
4G/5G Uplink/Downlink.
* ``dl4g_num_aqs_per_groups``: Number of aqs per group,
not expected to change for most deployments.
* ``dl4g_aq_depth_log2``: Log depth.
* ``ul5g_num_queue_groups``: Number of 5G Uplink queue groups,
there is in total 8 queue groups that can be distributed between 4G/5G
Uplink/Downlink - here 4 queues are used for 5G Uplink.
* ``ul5g_num_aqs_per_groups``: Number of aqs per group,
not expected to change for most deployments.
* ``ul5g_aq_depth_log2``: Log depth.
* ``dl5g_num_queue_groups``: Number of 5G Downlink queue groups,
there is in total 8 queue groups that can be distributed between,
4G/5G Uplink/Downlink - here 4 queues are used for 5G Downlink.
* ``dl5g_num_aqs_per_groups``: Number of aqs per group,
not expected to change for most deployments.
* ``dl5g_aq_depth_log2``: Log depth.
REST API impact
---------------
Standard extension of K8s APIs based on introduction of
SriovFecClusterConfig CRD.
Security impact
---------------
Current/Existing K8S Authentication and Authorization apply to standard
extension of K8S APIs based on introduction of SriovFecClusterConfig CRD.
Other end user impact
---------------------
End user will have the capability of more detailed configuration of FEC Devices.
Performance Impact
------------------
* In the existing method (method-1) configuration, resources (cpu and memory)
will be consumed only during the configuration.
* Using the FEC Operator method, service PODs will be running on master and
worker nodes all the time which will consume some amount of CPU and memory
resource from cluster housekeeping, which we believe this to be negligible.
* For a periodic reconciling, communication between controller-manager and
fec-daemon may consume network resources as well, assuming negligible.
Other deployer impact
---------------------
None.
Upgrade impact
--------------
None. The sriov-fec-operator application is optional.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
* Balendu Mouli Burla (balendu)
Other contributors:
* Nidhi Shivashankara Belur (nshivash)
Repos Impacted
--------------
A new system-application repo will be created for the definition and building
of the new sriov-fec-operator application.
Work Items
----------
* Create sriov-fec-operator application package
* Integrate sriov-fec-operator application to FlexCD. Add application
upload/apply/remove/delete commands.
* Update the docs.starlingx.io for HowTo configure FEC devices using FEC
operator application.
Dependencies
============
None
Testing
=======
* Testing will be performed on both SimpleX and DupleX mode deployment
configurations.
* Following functional validations will be performed
* Check by default FEC operator is disable when node startsup first time.
* Check the static configuration of FEC operator, make sure existing
functionality is good.
* Check enable/disable functionality of FEC operator in cluster.
* Configure the FEC device with FEC Operator, to make sure it overrides the
default configuration and verify the FEC functionality.
* Delete the CRD configuration, re-configure the device through static
configuration and verify the FEC functionality
* Configure the device through FEC operator and reboot the node, check the
node comes up with new configuration applied through fec-operator.
Documentation Impact
====================
docs.starlingx.io will be updated for:
* How to upload and apply sriov-fec-operator application
* How to perform enhanced configuration of FEC devices with
SriovFecClusterConfig CRD.
References
==========
Intel FEC Operator:
https://github.com/smart-edge-open/openshift-operator/blob/main/spec/openshift-sriov-fec-operator.md
Acronyms
--------
- FEC : Forward Error Correction
- LTE : Long Term Evolution
- vRAN : Virtual Radio Access Network
- SR-IOV : Single Root - Input/Output Virtualization
- PF : Physical Function
- VF : Virtual Function
- CRD : Custom Resource Definition
History
=======
Initial Version.