Merge "Adding spec: OVS-DPDk containerization"
This commit is contained in:
commit
05932b89d9
@ -0,0 +1,301 @@
|
|||||||
|
OVS-DPDK containerization
|
||||||
|
==========================================
|
||||||
|
|
||||||
|
Storyboard:
|
||||||
|
https://storyboard.openstack.org/#!/story/2005496
|
||||||
|
|
||||||
|
As StarlingX moves to containerization, most openstack components have been
|
||||||
|
containerized. That includes OVS containerization, but OVS-DPDK is still
|
||||||
|
running on host. This story is to implement OVS-DPDK containerization.
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
Currently, StarlingX supports OVS and OVS-DPDK. OVS is managed by
|
||||||
|
openstack-helm and running in container. But OVS-DPDK is managed by puppet,
|
||||||
|
and running directly on the host. Considering the benefits of containerization,
|
||||||
|
we would like to containerize OVS-DPDK. On the other hand, maintaining two
|
||||||
|
implementations and keeping them consistent cost more resources than
|
||||||
|
maintaining just one implementation.
|
||||||
|
|
||||||
|
Use Cases
|
||||||
|
---------
|
||||||
|
|
||||||
|
Without OVS-DPDK containerization:
|
||||||
|
|
||||||
|
* If we want to make some changes(upgrade OVS version, enable some features)
|
||||||
|
of OVS. We need the changes at two places.
|
||||||
|
* If we want to support other host OS distribution(i.e. Ubuntu), we need to
|
||||||
|
build the OVS/DPDK package for Ubuntu, as we run OVS-DPDK on the host.
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
This story includes StarlingX changes and openstack-helm upstream.
|
||||||
|
openstack-helm upstream patches are already in review.
|
||||||
|
|
||||||
|
'ovs-dpdk', 'none' are vswitch types we support for now.
|
||||||
|
'ovs-dpdk' means running OVS-DPDK on host, 'none' means running
|
||||||
|
OVS(without DPDK) in container. For containerized OVS-DPDK we don't create new
|
||||||
|
vswitch type, we enhance the 'none' type to support dpdk. It means 'none' type
|
||||||
|
will support both OVS and OVS-DPDK(containerized). A new kubernetes
|
||||||
|
node label(openvswitch-dpdk=enabled) will be used to control dpdk enable.
|
||||||
|
Once this story is completed, we will not maintain 'ovs-dpdk' type anymore.
|
||||||
|
|
||||||
|
Hugepages need to be reserved for DPDK. Currently, the reservation is done by
|
||||||
|
sysinv/puppet. In this story , the hugepages reservation will still be covered
|
||||||
|
by sysin/puppet. openstack-helm just use the hugepages. StarlingX reserves
|
||||||
|
hugepages for DPDK and nova-compute, we can run 'system host-memory-show
|
||||||
|
controller-0' to show the hugepages info. StarlingX has a default policy for
|
||||||
|
hugepages allocation, users can overwrite the default by
|
||||||
|
'system host-memory-modify'. As k8s doesn't support multiple hugepage sizes,
|
||||||
|
we can only reserve hugepages of a single size.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
[wrsroot@controller-0 ~(keystone_admin)]$ system host-memory-show controller-0 0
|
||||||
|
+-------------------------------------+--------------------------------------+
|
||||||
|
| Property | Value |
|
||||||
|
+-------------------------------------+--------------------------------------+
|
||||||
|
| Memory: Usable Total (MiB) | 9181 |
|
||||||
|
| Platform (MiB) | 7600 |
|
||||||
|
| Available (MiB) | 9181 |
|
||||||
|
| Huge Pages Configured | True |
|
||||||
|
| vSwitch Huge Pages: Size (MiB) | 2 |
|
||||||
|
| Total | 512 |
|
||||||
|
| Available | 0 |
|
||||||
|
| Required | None |
|
||||||
|
| Application Pages (4K): Total | 1826048 |
|
||||||
|
| Application Huge Pages (2M): Total | 1024 |
|
||||||
|
| Available | 1024 |
|
||||||
|
| Application Huge Pages (1G): Total | 0 |
|
||||||
|
| Available | None |
|
||||||
|
| uuid | 56be1dc6-dc10-4318-88e3-953f75eb6684 |
|
||||||
|
| ihost_uuid | 3fc748fa-a831-42f0-8c67-d15786806d6b |
|
||||||
|
| inode_uuid | c4ee7258-fd13-4520-80f5-62c93e2e2b20 |
|
||||||
|
| created_at | 2019-04-28T06:08:42.884178+00:00 |
|
||||||
|
| updated_at | 2019-05-05T06:21:04.987518+00:00 |
|
||||||
|
+-------------------------------------+--------------------------------------+
|
||||||
|
|
||||||
|
From above output, we can see 2M * 512 hugepages are reserved for OVS-DPDK.
|
||||||
|
In this story, `openvswitch helm plugin`_ will be updated to generate memory
|
||||||
|
configuration(dpdk-socket-mem) for openvswitch chart according to the reserved
|
||||||
|
hugepages info. If multiple NUMA nodes exist on the compute node, we should
|
||||||
|
allocated hugepages on every NUMA node.
|
||||||
|
|
||||||
|
To run OVS-DPDk in container, we need to enable kubernetes hugepages feature.
|
||||||
|
Currently kubernetes doesn't support multiple hugepage sizes on a single node.
|
||||||
|
I have opened `the multiple size issue`_ to track it.
|
||||||
|
|
||||||
|
OVS-DPDK process contains 2 types of threads: the control path threads and data
|
||||||
|
path threads. The control path threads run on Platform cores just like all
|
||||||
|
other pods. But the data path threads, known as pmd threads, need to run on one
|
||||||
|
or more dedicated cores.
|
||||||
|
StarlingX needs to reserve CPU cores for OVS-DPDK data path threads. Currently
|
||||||
|
StarlingX reserves CPU cores for OVS-DPDK(no-containerized) by sysinv which
|
||||||
|
generates kernel parameter
|
||||||
|
'isolcpus'. For containerized OVS-DPDK, CPU cores are going to be reserved in
|
||||||
|
the same way. We can run 'system host-cpu-list controller-0' to
|
||||||
|
show the CPU info. StarlingX has a default policy for CPU allocation, users can
|
||||||
|
overwrite the default by 'system host-cpu-modify'.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
[wrsroot@controller-0 ~(keystone_admin)]$ system host-cpu-list controller-0
|
||||||
|
+--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+
|
||||||
|
| uuid | log_c | processor | phy_c | thread | processor_model | assigned_function |
|
||||||
|
| | ore | | ore | | | |
|
||||||
|
+--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+
|
||||||
|
| a6189494-a2da-4f26-8a18-658d3fa5ad4f | 0 | 0 | 0 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Platform |
|
||||||
|
| c7d0de01-7c95-4b90-a423-d19d777e5b86 | 1 | 0 | 1 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Platform |
|
||||||
|
| 0e644162-ee11-486d-8249-94099d34a160 | 2 | 0 | 2 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | vSwitch |
|
||||||
|
| 3b13943e-5d8e-49ab-b63e-17311e314f32 | 3 | 0 | 3 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications |
|
||||||
|
| a36e8842-2f55-4697-bd89-f074b2e0c567 | 4 | 0 | 4 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications |
|
||||||
|
| a74c066b-5a9a-48bd-aeec-9e803e395f7f | 5 | 0 | 5 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications |
|
||||||
|
+--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+
|
||||||
|
|
||||||
|
From above output, we can see core 2 is allocated for OVS-DPDK pmd threads.
|
||||||
|
In this story, `openvswitch helm plugin`_ will be updated to generate CPU
|
||||||
|
configurations(dpdk-lcore-mask, pmd-cpu-mask). 'pmd-cpu-mask' is the OVS
|
||||||
|
parameter which specifies which CPU cores will the PMD threads run on.
|
||||||
|
The technology under 'pmd-cpu-mask' is cpuset cgroup. By default, all pods
|
||||||
|
can only see the platform cores. We need to change the cgroup of ovs at
|
||||||
|
launch time. Actually, StarlingX also
|
||||||
|
reserve CPU cores for nova-compute(assigned_function of Applications),
|
||||||
|
finally rendered as 'vcpu_pin_set' in nova.conf
|
||||||
|
|
||||||
|
When a compute node being unlocked, the vswitch.pp does some OVS related works:
|
||||||
|
1) bind datanetwork NICs to a linux module(vfio-pci by default in StarlingX).
|
||||||
|
2) Create OVS bridges 3) Add the NICs to bridges. In this story, the first
|
||||||
|
item can be covered by puppet or openstack-helm or by using
|
||||||
|
NetworkDeviceAttachment which leverages existing SRIOV CNI. The second and
|
||||||
|
the third items will be covered by openstack-helm. To create OVS bridges and
|
||||||
|
add NICs to bridges, openstack-helm needs to know the bridge names and the
|
||||||
|
NIC pci_id. These parameters will be generated by `neutron helm plugin`_
|
||||||
|
according the info in sysinv.
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
As the k8s hugepage feature doesn't support multiple hugepage sizes for now,
|
||||||
|
we can allocate hugepages of only 1 single size. That means we can only create
|
||||||
|
VM of 1 single hugepage size. The limitation is described in the
|
||||||
|
`hugepage spec commit`_
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Suppose no impact
|
||||||
|
|
||||||
|
For networking, OVS-DPDK container uses host native network.
|
||||||
|
|
||||||
|
For CPU/memory, although container resource is limited, but the resource used
|
||||||
|
by OVS is configured by OVS parameters instead of container limitation.
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
'openvswitch-dpdk=enabled' label is required for compute nodes to enable
|
||||||
|
OVS-DPDK.
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Once this feature is implemented, we don't run OVS-DPDK on the host. So the
|
||||||
|
vswitch.pp file will be removed, openstack-helm takes its job for OVS-DPDK
|
||||||
|
configuration.
|
||||||
|
|
||||||
|
Upgrade impact
|
||||||
|
--------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
chengli3 <cheng1.li@intel.com>
|
||||||
|
|
||||||
|
Other contributors:
|
||||||
|
<launchpad-id or None>
|
||||||
|
|
||||||
|
Repos Impacted
|
||||||
|
--------------
|
||||||
|
|
||||||
|
starlingx/config, starlingx/integ
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
* Improve OVS docker image to support dpdk (starlingx/integ).
|
||||||
|
To support dpdk, dpdk should be installed in OVS image and OVS should be
|
||||||
|
built/installed with `dpdk install option`_ (--with-dpdk). The community OVS
|
||||||
|
image already support dpdk by `image patch`_. To build ourselves OVS image,
|
||||||
|
we can author our OVS docker file in starling/integ project. The OVS/DPDK
|
||||||
|
version will be the same as the host. The docker image
|
||||||
|
OS may needs to be CentOS as well, as OVS container mounts host /lib/modules.
|
||||||
|
* Make OVS chart supporting dpdk (openstack-helm-infra).
|
||||||
|
To support dpdk, OVS needs to be setup with `dpdk setup options`_.
|
||||||
|
`ovs patch`_ is in review.
|
||||||
|
* Make neutron chart supporting dpdk (openstack-helm)
|
||||||
|
|
||||||
|
* `Extra neutron configurations`_ are needed for dpdk supporting.
|
||||||
|
* In openstack-helm, neutron chart takes responsibility of adding NIC to OVS
|
||||||
|
bridge. So neutron chart takes `dpdk interface initialization`_ as
|
||||||
|
well. `neutron patch`_ is already in review.
|
||||||
|
* Reserve huge pages for OVS-DPDK and enable k8s hugepage feature
|
||||||
|
(starlingx/config).
|
||||||
|
`huge pages`_ should be reserved for containerized OVS-DPDK. The same as how
|
||||||
|
we reserve huge pages for vswitch_type 'ovs-dpdk'.
|
||||||
|
* Generate dpdk related configurations for openstack deployment
|
||||||
|
(starlingx/config).
|
||||||
|
`openvswitch helm plugin`_ needs be updated to add dpdk configurations.
|
||||||
|
`neutron helm plugin`_ should be updated as well.
|
||||||
|
* Docs update (starlingx/docs)
|
||||||
|
Update the installation guide
|
||||||
|
|
||||||
|
.. _dpdk install option: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#install-ovs
|
||||||
|
.. _image patch: https://review.opendev.org/#/c/665310/
|
||||||
|
.. _dpdk setup options: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-ovs
|
||||||
|
.. _ovs patch: https://review.openstack.org/#/c/626894/
|
||||||
|
.. _Extra neutron configurations: https://docs.openstack.org/neutron/pike/contributor/internals/ovs_vhostuser.html
|
||||||
|
.. _dpdk interface initialization: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-dpdk-devices-using-vfio
|
||||||
|
.. _neutron patch: https://review.openstack.org/#/c/643284/
|
||||||
|
.. _huge pages: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-hugepages
|
||||||
|
.. _openvswitch helm plugin: https://github.com/openstack/stx-config/tree/a5def9a1447a004348b0adfa8fb774c32add34fe/sysinv/sysinv/sysinv/sysinv/helm/openvswitch.py
|
||||||
|
.. _neutron helm plugin: https://github.com/openstack/stx-config/blob/a5def9a1447a004348b0adfa8fb774c32add34fe/sysinv/sysinv/sysinv/sysinv/helm/neutron.py
|
||||||
|
.. _ovs.py: https://opendev.org/starlingx/config/src/commit/e0d453a98b72606ec9a0b90a3acb5bbda546d2ff/sysinv/sysinv/sysinv/sysinv/puppet/ovs.py#L318-L365
|
||||||
|
.. _the multiple size issue: https://github.com/kubernetes/kubernetes/issues/77251
|
||||||
|
.. _hugepage spec commit: https://github.com/kubernetes/community/pull/837/files#r133337110
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
* Needs OVS version >=2.6 to support vhost-user reconnect.
|
||||||
|
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
The host NICs those are planed for data networks must support DPDK.
|
||||||
|
Multiple hosts are needed to test connection cross hosts.
|
||||||
|
|
||||||
|
The following cases are needed:
|
||||||
|
|
||||||
|
* Creating VM and test the networking connection between VMs and the external
|
||||||
|
connection.
|
||||||
|
* Check if any issue with host reboot.
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
The installation guides on the wiki need to be updated. There will be a little
|
||||||
|
difference for deployer on vswitch type setting.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
* http://docs.openvswitch.org/en/latest/intro/install/dpdk/
|
||||||
|
|
||||||
|
* https://opendev.org/openstack/openstack-helm-infra/src/branch/master/openvswitch
|
||||||
|
|
||||||
|
* https://opendev.org/openstack/openstack-helm/src/branch/master/neutron
|
||||||
|
|
||||||
|
History
|
||||||
|
=======
|
||||||
|
|
||||||
|
Optional section intended to be used each time the spec is updated to describe
|
||||||
|
new design, API or any database schema updated. Useful to let reader understand
|
||||||
|
what's happened along the time.
|
||||||
|
|
||||||
|
.. list-table:: Revisions
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Release Name
|
||||||
|
- Description
|
||||||
|
* - Stein
|
||||||
|
- Introduced
|
Loading…
Reference in New Issue
Block a user