Wireless FEC Opeator application for StarlingX
This specification describe Intel Wireless FEC operator application for StarlingX. Story: 2009749 Task: 44206 Change-Id: Ie84b97f81d5ae21bc2fcf1f57a8298b923a65bf8
This commit is contained in:
parent
dfaeb38ab7
commit
ac580e3db7
|
@ -0,0 +1,375 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License. http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=================================
|
||||
Wireless FEC Operator integration
|
||||
=================================
|
||||
|
||||
Integration of Intel Wireless FEC Operator to StarlingX platform
|
||||
================================================================
|
||||
|
||||
Storyboard:
|
||||
https://storyboard.openstack.org/#!/story/2009749
|
||||
|
||||
In a distributed cloud environment for vRAN workloads, there may be hundreds
|
||||
of sub-clouds, with each sub-cloud having one or more worker nodes managed
|
||||
by a System Controller, Some of these sub-clouds have worker nodes with
|
||||
Intel accelerator devices to offload 4G/LTE and 5G FEC (Forward Error
|
||||
Correction) operations.
|
||||
|
||||
These FEC devices have the flexibility to configure the hardware resource
|
||||
on a per vRAN workload basis to gain the optimal performance. In a typical
|
||||
scenario based on deployment locations, individual vRAN workload requirements
|
||||
may vary.
|
||||
|
||||
For an admin to manage and/or configure these Intel FEC accelerated devices
|
||||
in a containerized environment, additional functionality is required. The
|
||||
current configurability method in StartlingX does not support the flexibility
|
||||
to configure all the parameters in FEC h/w accelerator and has a
|
||||
pre-defined/static configuration options for typical workloads.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Today in StarlingX, configuration of FEC devices is performed through a user
|
||||
application "pf-bb-config", which in turn statically sets configurable
|
||||
parameters through a config file. Current version of StarlingX does support
|
||||
the configuration a few parameters (only 1 or 2) of FEC devices through
|
||||
"system" commands which in turn triggers puppet to "pf-bb-config" application
|
||||
when the system is unlocked.
|
||||
|
||||
Current configurability option uses pre-defined/static config files to
|
||||
configure FEC devices to support most the common vRAN workload requirements.
|
||||
To support other combinations of configurations and changing the configuration
|
||||
on different nodes in a cluster requires to add and maintain this configuration
|
||||
file in a somewhat unsupported fashion.
|
||||
|
||||
In addition to that, the next generation FEC devices ie., ACC101, ACC200, ...
|
||||
support may need enhancements to the existing configuration method.
|
||||
|
||||
The Intel supported FEC Operator is a SRO (Special Resource Operator)
|
||||
for K8s which performs:
|
||||
* detects and labels the nodes which have FEC h/w accelerators installed
|
||||
* Configuration of FEC devices through standard K8s APIs (in JSON format)
|
||||
* Validation of FEC device configuration parameters
|
||||
* Configuration can be applied at cluster level or node level and device level
|
||||
* deployable through Kustomize/Helm deployment models
|
||||
* Support for next generation FEC devices is seamless
|
||||
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
FEC Operator is an optional system application for the vRAN deployments
|
||||
where there is a need for fine tuning Intel FEC h/w accelerator resources
|
||||
(ie., number of VFs, queues, queue groups, etc..) based on deployment workloads.
|
||||
|
||||
List of parameters that can be configurable through the FEC Operator are:
|
||||
|
||||
* Number of VF interface (VF bundles)
|
||||
* PF/VF mode
|
||||
* Enabling 4Gonly, 5Gonly or both 4G and 5G
|
||||
* for each direction (uplink/downlink) configuration of:
|
||||
* number of queuegroups, aqsPerGroup and aqDepth
|
||||
|
||||
User has the flexibility to apply these configuration per devices per node
|
||||
in a cluster using the native kubectl API interface.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
The current method of configuration of FEC devices will be the default
|
||||
configuration for existing vRAN deployments that will not be changed.
|
||||
|
||||
FEC Operator will be added as an optional System application
|
||||
(sriov-fec-operator), which by default will be disabled (i.e. not applied or
|
||||
uploaded). Deployment of FEC operator is through helm charts packaged in the
|
||||
new system application manifest. Users on demand, can enable, deploy and
|
||||
configure the FEC operator by updating and applying helm overrides for the
|
||||
new system application.
|
||||
|
||||
FEC Operator functionality is distributed in few PODs:
|
||||
* sriov-fec-controller-manager
|
||||
|
||||
* Runs on all master nodes in cluster, provides K8s Custom Resource
|
||||
API services for FEC device configuration,
|
||||
* communicates with FEC operator service running on each node
|
||||
to configure the FEC devices and reconciling.
|
||||
|
||||
* sriov-fec-daemonset
|
||||
|
||||
* Runs on each node in cluster,receives configuration from
|
||||
controller-manager
|
||||
* Detects the FEC devices on the platform/node
|
||||
* Based on data configured in SriovFecClusterConfig CRD
|
||||
* Binds the PF (Physical Function) interface with required driver
|
||||
ie., igb_uio or pci-pf-stub.
|
||||
* Creates the required number of VF interfaces
|
||||
* Bind the VF interface with driver (igb_uio, vfio-pci)
|
||||
* configure the FEC device using the pf-bb-config tool
|
||||
|
||||
* sriov-device-plugin
|
||||
|
||||
* Runs on each node, to manage the FEC device SR-IOV VF (Virtual Function)
|
||||
resources configured to user application PODs.
|
||||
|
||||
* accelerator-discovery
|
||||
|
||||
* Runs on each node to detect the FEC devices on each node
|
||||
* label the nodes which have FEC device
|
||||
|
||||
With the two different methods of FEC device configuration,
|
||||
method-1: Default, existing method
|
||||
method-2: using FEC Operator
|
||||
|
||||
Method-1(existing method) is the default method applied on node startup.
|
||||
If SriovFecClusterConfig CRD is applied then sriov-fec-daemonset on the
|
||||
node will overwrite the existing configuration for that particular device
|
||||
on the node.
|
||||
|
||||
If admin want switch back to default static method, then performs the
|
||||
SriovFecClusterConfig CRD delete operation and reconfigure the device
|
||||
through method-1.
|
||||
|
||||
NOTE:
|
||||
|
||||
Reconfiguration and/or switching between configuration methods will
|
||||
impact the FEC device usage for the vRAN application PODs. Below listed
|
||||
steps recommended to follow during reconfiguration and/or switching
|
||||
configuration methods.
|
||||
|
||||
- vRAN Application PODs should stop using the FEC devices and terminated.
|
||||
- Perform reconfiguration of device or switch the method and reconfigure.
|
||||
- Redeploy the vRAN application PODs to use the FEC device.
|
||||
|
||||
FEC devices supported through FEC Operator in STX 7 are:
|
||||
ACC100(Mt.Bryce), N3000 FPGA
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
The current method of configuration to FEC devices is the default method of
|
||||
configuration and enabled by-default.
|
||||
|
||||
Configuration through FEC Operator is an optional alternative method.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
Sriov-fec-operator application is introducing the new
|
||||
SriovFecClusterConfig CRD to the cluster.
|
||||
|
||||
|
||||
Sample Cluster configuration:
|
||||
-----------------------------
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
apiVersion: sriovfec.intel.com/v2
|
||||
kind: SriovFecClusterConfig
|
||||
metadata:
|
||||
name: config
|
||||
namespace: sriov-fec-system
|
||||
spec:
|
||||
priority: 1
|
||||
nodeSelector:
|
||||
kubernetes.io/hostname: <node-label>
|
||||
acceleratorSelector:
|
||||
pciAddress: 00000:17:00.0
|
||||
physicalFunction:
|
||||
pfDriver: "pci-pf-stub"
|
||||
vfDriver: "vfio-pci"
|
||||
vfAmount: 16
|
||||
bbDevConfig:
|
||||
acc100:
|
||||
# Programming mode: 0 = VF Programming, 1 = PF Programming
|
||||
pfMode: false
|
||||
numVfBundles: 16
|
||||
maxQueueSize: 1024
|
||||
uplink4G:
|
||||
numQueueGroups: 0
|
||||
numAqsPerGroups: 16
|
||||
aqDepthLog2: 4
|
||||
downlink4G:
|
||||
numQueueGroups: 0
|
||||
numAqsPerGroups: 16
|
||||
aqDepthLog2: 4
|
||||
uplink5G:
|
||||
numQueueGroups: 4
|
||||
numAqsPerGroups: 16
|
||||
aqDepthLog2: 4
|
||||
downlink5G:
|
||||
numQueueGroups: 4
|
||||
numAqsPerGroups: 16
|
||||
aqDepthLog2: 4
|
||||
|
||||
sriov_fec_cluster_config parameters description:
|
||||
------------------------------------------------
|
||||
|
||||
* ``name``: Name of the specific config.
|
||||
* ``cluster_config_name``: Name of the cluster config.
|
||||
* ``priority``: Priority of deployment (lower number higher priority).
|
||||
* ``drainskip``: Allows for skipping the draining of the node after
|
||||
config application.
|
||||
* ``selected_node``: (Optional) field that can be used to target only
|
||||
specific node.
|
||||
* ``pf_driver``: The PF driver to be used igb_uio or pci-pf-stub.
|
||||
* ``vf_driver``: The VF driver to be used vfio-pci or igb_uio.
|
||||
* ``vf_amount``: The amount of VFs to be created for the device.
|
||||
* ``bbdevconfig``:
|
||||
|
||||
* ``pf_mode``: The mode in which accelerator will be programmed,
|
||||
it is expected that VFs will be used and this is set to false.
|
||||
* ``num_vf_bundles``: Number of VF bundles this should correspond
|
||||
to the vf_amount field.
|
||||
* ``max_queue_size``: Max queue size this field is not expected to
|
||||
change in most deployments.
|
||||
* ``ul4g_num_queue_groups``: Number of 4G Uplink queue groups,
|
||||
there is in total 8 queue groups that can be distributed between
|
||||
4G/5G Uplink/Downlink.
|
||||
* ``ul4g_num_aqs_per_groups``: Number of aqs per group - not expected
|
||||
to change for most deployments.
|
||||
* ``ul4g_aq_depth_log2``: Log depth
|
||||
* ``dl4g_num_queue_groups``: Number of 4G Downlink queue groups,
|
||||
there is in total 8 queue groups that can be distributed between
|
||||
4G/5G Uplink/Downlink.
|
||||
* ``dl4g_num_aqs_per_groups``: Number of aqs per group,
|
||||
not expected to change for most deployments.
|
||||
* ``dl4g_aq_depth_log2``: Log depth.
|
||||
* ``ul5g_num_queue_groups``: Number of 5G Uplink queue groups,
|
||||
there is in total 8 queue groups that can be distributed between 4G/5G
|
||||
Uplink/Downlink - here 4 queues are used for 5G Uplink.
|
||||
* ``ul5g_num_aqs_per_groups``: Number of aqs per group,
|
||||
not expected to change for most deployments.
|
||||
* ``ul5g_aq_depth_log2``: Log depth.
|
||||
* ``dl5g_num_queue_groups``: Number of 5G Downlink queue groups,
|
||||
there is in total 8 queue groups that can be distributed between,
|
||||
4G/5G Uplink/Downlink - here 4 queues are used for 5G Downlink.
|
||||
* ``dl5g_num_aqs_per_groups``: Number of aqs per group,
|
||||
not expected to change for most deployments.
|
||||
* ``dl5g_aq_depth_log2``: Log depth.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
Standard extension of K8s APIs based on introduction of
|
||||
SriovFecClusterConfig CRD.
|
||||
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
Current/Existing K8S Authentication and Authorization apply to standard
|
||||
extension of K8S APIs based on introduction of SriovFecClusterConfig CRD.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
End user will have the capability of more detailed configuration of FEC Devices.
|
||||
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
* In the existing method (method-1) configuration, resources (cpu and memory)
|
||||
will be consumed only during the configuration.
|
||||
|
||||
* Using the FEC Operator method, service PODs will be running on master and
|
||||
worker nodes all the time which will consume some amount of CPU and memory
|
||||
resource from cluster housekeeping, which we believe this to be negligible.
|
||||
|
||||
* For a periodic reconciling, communication between controller-manager and
|
||||
fec-daemon may consume network resources as well, assuming negligible.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
None.
|
||||
|
||||
Upgrade impact
|
||||
--------------
|
||||
None. The sriov-fec-operator application is optional.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
|
||||
* Balendu Mouli Burla (balendu)
|
||||
|
||||
Other contributors:
|
||||
|
||||
* Nidhi Shivashankara Belur (nshivash)
|
||||
|
||||
Repos Impacted
|
||||
--------------
|
||||
|
||||
A new system-application repo will be created for the definition and building
|
||||
of the new sriov-fec-operator application.
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Create sriov-fec-operator application package
|
||||
* Integrate sriov-fec-operator application to FlexCD. Add application
|
||||
upload/apply/remove/delete commands.
|
||||
* Update the docs.starlingx.io for HowTo configure FEC devices using FEC
|
||||
operator application.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Testing will be performed on both SimpleX and DupleX mode deployment
|
||||
configurations.
|
||||
* Following functional validations will be performed
|
||||
|
||||
* Check by default FEC operator is disable when node startsup first time.
|
||||
* Check the static configuration of FEC operator, make sure existing
|
||||
functionality is good.
|
||||
* Check enable/disable functionality of FEC operator in cluster.
|
||||
* Configure the FEC device with FEC Operator, to make sure it overrides the
|
||||
default configuration and verify the FEC functionality.
|
||||
* Delete the CRD configuration, re-configure the device through static
|
||||
configuration and verify the FEC functionality
|
||||
* Configure the device through FEC operator and reboot the node, check the
|
||||
node comes up with new configuration applied through fec-operator.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
docs.starlingx.io will be updated for:
|
||||
* How to upload and apply sriov-fec-operator application
|
||||
|
||||
* How to perform enhanced configuration of FEC devices with
|
||||
SriovFecClusterConfig CRD.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
Intel FEC Operator:
|
||||
https://github.com/smart-edge-open/openshift-operator/blob/main/spec/openshift-sriov-fec-operator.md
|
||||
|
||||
Acronyms
|
||||
--------
|
||||
|
||||
- FEC : Forward Error Correction
|
||||
- LTE : Long Term Evolution
|
||||
- vRAN : Virtual Radio Access Network
|
||||
- SR-IOV : Single Root - Input/Output Virtualization
|
||||
- PF : Physical Function
|
||||
- VF : Virtual Function
|
||||
- CRD : Custom Resource Definition
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
Initial Version.
|
Loading…
Reference in New Issue