Merge "Adding spec: OVS-DPDk containerization"

This commit is contained in:
Zuul 2019-08-29 13:32:19 +00:00 committed by Gerrit Code Review
commit 05932b89d9
1 changed files with 301 additions and 0 deletions

View File

@ -0,0 +1,301 @@
OVS-DPDK containerization
==========================================
Storyboard:
https://storyboard.openstack.org/#!/story/2005496
As StarlingX moves to containerization, most openstack components have been
containerized. That includes OVS containerization, but OVS-DPDK is still
running on host. This story is to implement OVS-DPDK containerization.
Problem description
===================
Currently, StarlingX supports OVS and OVS-DPDK. OVS is managed by
openstack-helm and running in container. But OVS-DPDK is managed by puppet,
and running directly on the host. Considering the benefits of containerization,
we would like to containerize OVS-DPDK. On the other hand, maintaining two
implementations and keeping them consistent cost more resources than
maintaining just one implementation.
Use Cases
---------
Without OVS-DPDK containerization:
* If we want to make some changes(upgrade OVS version, enable some features)
of OVS. We need the changes at two places.
* If we want to support other host OS distribution(i.e. Ubuntu), we need to
build the OVS/DPDK package for Ubuntu, as we run OVS-DPDK on the host.
Proposed change
===============
This story includes StarlingX changes and openstack-helm upstream.
openstack-helm upstream patches are already in review.
'ovs-dpdk', 'none' are vswitch types we support for now.
'ovs-dpdk' means running OVS-DPDK on host, 'none' means running
OVS(without DPDK) in container. For containerized OVS-DPDK we don't create new
vswitch type, we enhance the 'none' type to support dpdk. It means 'none' type
will support both OVS and OVS-DPDK(containerized). A new kubernetes
node label(openvswitch-dpdk=enabled) will be used to control dpdk enable.
Once this story is completed, we will not maintain 'ovs-dpdk' type anymore.
Hugepages need to be reserved for DPDK. Currently, the reservation is done by
sysinv/puppet. In this story , the hugepages reservation will still be covered
by sysin/puppet. openstack-helm just use the hugepages. StarlingX reserves
hugepages for DPDK and nova-compute, we can run 'system host-memory-show
controller-0' to show the hugepages info. StarlingX has a default policy for
hugepages allocation, users can overwrite the default by
'system host-memory-modify'. As k8s doesn't support multiple hugepage sizes,
we can only reserve hugepages of a single size.
::
[wrsroot@controller-0 ~(keystone_admin)]$ system host-memory-show controller-0 0
+-------------------------------------+--------------------------------------+
| Property | Value |
+-------------------------------------+--------------------------------------+
| Memory: Usable Total (MiB) | 9181 |
| Platform (MiB) | 7600 |
| Available (MiB) | 9181 |
| Huge Pages Configured | True |
| vSwitch Huge Pages: Size (MiB) | 2 |
| Total | 512 |
| Available | 0 |
| Required | None |
| Application Pages (4K): Total | 1826048 |
| Application Huge Pages (2M): Total | 1024 |
| Available | 1024 |
| Application Huge Pages (1G): Total | 0 |
| Available | None |
| uuid | 56be1dc6-dc10-4318-88e3-953f75eb6684 |
| ihost_uuid | 3fc748fa-a831-42f0-8c67-d15786806d6b |
| inode_uuid | c4ee7258-fd13-4520-80f5-62c93e2e2b20 |
| created_at | 2019-04-28T06:08:42.884178+00:00 |
| updated_at | 2019-05-05T06:21:04.987518+00:00 |
+-------------------------------------+--------------------------------------+
From above output, we can see 2M * 512 hugepages are reserved for OVS-DPDK.
In this story, `openvswitch helm plugin`_ will be updated to generate memory
configuration(dpdk-socket-mem) for openvswitch chart according to the reserved
hugepages info. If multiple NUMA nodes exist on the compute node, we should
allocated hugepages on every NUMA node.
To run OVS-DPDk in container, we need to enable kubernetes hugepages feature.
Currently kubernetes doesn't support multiple hugepage sizes on a single node.
I have opened `the multiple size issue`_ to track it.
OVS-DPDK process contains 2 types of threads: the control path threads and data
path threads. The control path threads run on Platform cores just like all
other pods. But the data path threads, known as pmd threads, need to run on one
or more dedicated cores.
StarlingX needs to reserve CPU cores for OVS-DPDK data path threads. Currently
StarlingX reserves CPU cores for OVS-DPDK(no-containerized) by sysinv which
generates kernel parameter
'isolcpus'. For containerized OVS-DPDK, CPU cores are going to be reserved in
the same way. We can run 'system host-cpu-list controller-0' to
show the CPU info. StarlingX has a default policy for CPU allocation, users can
overwrite the default by 'system host-cpu-modify'.
::
[wrsroot@controller-0 ~(keystone_admin)]$ system host-cpu-list controller-0
+--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+
| uuid | log_c | processor | phy_c | thread | processor_model | assigned_function |
| | ore | | ore | | | |
+--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+
| a6189494-a2da-4f26-8a18-658d3fa5ad4f | 0 | 0 | 0 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Platform |
| c7d0de01-7c95-4b90-a423-d19d777e5b86 | 1 | 0 | 1 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Platform |
| 0e644162-ee11-486d-8249-94099d34a160 | 2 | 0 | 2 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | vSwitch |
| 3b13943e-5d8e-49ab-b63e-17311e314f32 | 3 | 0 | 3 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications |
| a36e8842-2f55-4697-bd89-f074b2e0c567 | 4 | 0 | 4 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications |
| a74c066b-5a9a-48bd-aeec-9e803e395f7f | 5 | 0 | 5 | 0 | Intel Core i7 9xx (Nehalem Class Core i7) | Applications |
+--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+
From above output, we can see core 2 is allocated for OVS-DPDK pmd threads.
In this story, `openvswitch helm plugin`_ will be updated to generate CPU
configurations(dpdk-lcore-mask, pmd-cpu-mask). 'pmd-cpu-mask' is the OVS
parameter which specifies which CPU cores will the PMD threads run on.
The technology under 'pmd-cpu-mask' is cpuset cgroup. By default, all pods
can only see the platform cores. We need to change the cgroup of ovs at
launch time. Actually, StarlingX also
reserve CPU cores for nova-compute(assigned_function of Applications),
finally rendered as 'vcpu_pin_set' in nova.conf
When a compute node being unlocked, the vswitch.pp does some OVS related works:
1) bind datanetwork NICs to a linux module(vfio-pci by default in StarlingX).
2) Create OVS bridges 3) Add the NICs to bridges. In this story, the first
item can be covered by puppet or openstack-helm or by using
NetworkDeviceAttachment which leverages existing SRIOV CNI. The second and
the third items will be covered by openstack-helm. To create OVS bridges and
add NICs to bridges, openstack-helm needs to know the bridge names and the
NIC pci_id. These parameters will be generated by `neutron helm plugin`_
according the info in sysinv.
Alternatives
------------
None
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Other end user impact
---------------------
As the k8s hugepage feature doesn't support multiple hugepage sizes for now,
we can allocate hugepages of only 1 single size. That means we can only create
VM of 1 single hugepage size. The limitation is described in the
`hugepage spec commit`_
Performance Impact
------------------
Suppose no impact
For networking, OVS-DPDK container uses host native network.
For CPU/memory, although container resource is limited, but the resource used
by OVS is configured by OVS parameters instead of container limitation.
Other deployer impact
---------------------
'openvswitch-dpdk=enabled' label is required for compute nodes to enable
OVS-DPDK.
Developer impact
----------------
Once this feature is implemented, we don't run OVS-DPDK on the host. So the
vswitch.pp file will be removed, openstack-helm takes its job for OVS-DPDK
configuration.
Upgrade impact
--------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
chengli3 <cheng1.li@intel.com>
Other contributors:
<launchpad-id or None>
Repos Impacted
--------------
starlingx/config, starlingx/integ
Work Items
----------
* Improve OVS docker image to support dpdk (starlingx/integ).
To support dpdk, dpdk should be installed in OVS image and OVS should be
built/installed with `dpdk install option`_ (--with-dpdk). The community OVS
image already support dpdk by `image patch`_. To build ourselves OVS image,
we can author our OVS docker file in starling/integ project. The OVS/DPDK
version will be the same as the host. The docker image
OS may needs to be CentOS as well, as OVS container mounts host /lib/modules.
* Make OVS chart supporting dpdk (openstack-helm-infra).
To support dpdk, OVS needs to be setup with `dpdk setup options`_.
`ovs patch`_ is in review.
* Make neutron chart supporting dpdk (openstack-helm)
* `Extra neutron configurations`_ are needed for dpdk supporting.
* In openstack-helm, neutron chart takes responsibility of adding NIC to OVS
bridge. So neutron chart takes `dpdk interface initialization`_ as
well. `neutron patch`_ is already in review.
* Reserve huge pages for OVS-DPDK and enable k8s hugepage feature
(starlingx/config).
`huge pages`_ should be reserved for containerized OVS-DPDK. The same as how
we reserve huge pages for vswitch_type 'ovs-dpdk'.
* Generate dpdk related configurations for openstack deployment
(starlingx/config).
`openvswitch helm plugin`_ needs be updated to add dpdk configurations.
`neutron helm plugin`_ should be updated as well.
* Docs update (starlingx/docs)
Update the installation guide
.. _dpdk install option: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#install-ovs
.. _image patch: https://review.opendev.org/#/c/665310/
.. _dpdk setup options: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-ovs
.. _ovs patch: https://review.openstack.org/#/c/626894/
.. _Extra neutron configurations: https://docs.openstack.org/neutron/pike/contributor/internals/ovs_vhostuser.html
.. _dpdk interface initialization: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-dpdk-devices-using-vfio
.. _neutron patch: https://review.openstack.org/#/c/643284/
.. _huge pages: https://docs.openvswitch.org/en/latest/intro/install/dpdk/#setup-hugepages
.. _openvswitch helm plugin: https://github.com/openstack/stx-config/tree/a5def9a1447a004348b0adfa8fb774c32add34fe/sysinv/sysinv/sysinv/sysinv/helm/openvswitch.py
.. _neutron helm plugin: https://github.com/openstack/stx-config/blob/a5def9a1447a004348b0adfa8fb774c32add34fe/sysinv/sysinv/sysinv/sysinv/helm/neutron.py
.. _ovs.py: https://opendev.org/starlingx/config/src/commit/e0d453a98b72606ec9a0b90a3acb5bbda546d2ff/sysinv/sysinv/sysinv/sysinv/puppet/ovs.py#L318-L365
.. _the multiple size issue: https://github.com/kubernetes/kubernetes/issues/77251
.. _hugepage spec commit: https://github.com/kubernetes/community/pull/837/files#r133337110
Dependencies
============
* Needs OVS version >=2.6 to support vhost-user reconnect.
Testing
=======
The host NICs those are planed for data networks must support DPDK.
Multiple hosts are needed to test connection cross hosts.
The following cases are needed:
* Creating VM and test the networking connection between VMs and the external
connection.
* Check if any issue with host reboot.
Documentation Impact
====================
The installation guides on the wiki need to be updated. There will be a little
difference for deployer on vswitch type setting.
References
==========
* http://docs.openvswitch.org/en/latest/intro/install/dpdk/
* https://opendev.org/openstack/openstack-helm-infra/src/branch/master/openvswitch
* https://opendev.org/openstack/openstack-helm/src/branch/master/neutron
History
=======
Optional section intended to be used each time the spec is updated to describe
new design, API or any database schema updated. Useful to let reader understand
what's happened along the time.
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Stein
- Introduced