Spec for retrieving NUMA node information
Proposing a spec for retrieving NUMA node information during introspection. Partial-Bug: #1635253 Co-Authored-By: Saravanan KR <skramaja@redhat.com> Change-Id: I99aa9f0462f45a1cccec72801fbd14d1395b6386 Signed-off-by: karthik s <ksundara@redhat.com>
This commit is contained in:
parent
bf09d74b47
commit
da809c2100
|
@ -0,0 +1,218 @@
|
||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
===============================
|
||||||
|
Retrieve NUMA node information
|
||||||
|
===============================
|
||||||
|
|
||||||
|
https://bugs.launchpad.net/ironic-python-agent/+bug/1635253
|
||||||
|
|
||||||
|
Today, The introspected data from the nodes does not provide information about
|
||||||
|
the NUMA topology to the deployer. The deployer needs information on the
|
||||||
|
associvity of NUMA nodes with the list of cores and NICs. These details would
|
||||||
|
be required for configuring the nodes with DPDK aware NICs.
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
In order to configure the nodes for better performance, selection of CPUs
|
||||||
|
based on the NUMA topology becomes necessary. In case of nodes with DPDK aware
|
||||||
|
NICs, the CPUs for poll mode driver (PMD) needs to be selected from the
|
||||||
|
NUMA node associated with the DPDK NICs. If hyperthreading is enabled, then
|
||||||
|
selection of the logical cores requires the knowledge on the siblings. This
|
||||||
|
information shall be read from the Swift-stored data by the deployer and it
|
||||||
|
shall help the deployer to manually configure the deployment parameters for
|
||||||
|
better performance.
|
||||||
|
|
||||||
|
For example, to list available NUMA-aware NICs::
|
||||||
|
|
||||||
|
$ lspci -nn | grep Eth
|
||||||
|
82:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
|
||||||
|
82:00.1 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
|
||||||
|
85:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
|
||||||
|
85:00.1 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
|
||||||
|
|
||||||
|
To obtain the NUMA node ID from the PCI device through the sysfs::
|
||||||
|
|
||||||
|
$ cat /sys/bus/pci/devices/0000\:85\:00.1/numa_node
|
||||||
|
1
|
||||||
|
|
||||||
|
To get the best performance we need to ensure that the CPU core and NIC are in
|
||||||
|
the same NUMA node. In the example above, the NIC with PCI address ``85:00.1``
|
||||||
|
is in NUMA node ``1``. In order to achieve best performance the NIC should be
|
||||||
|
preferably used by the DPDK Poll Mode Drivers (PMD) running on the CPU cores
|
||||||
|
in NUMA node ``1``. If not, the best performance is not guranteed as the
|
||||||
|
above-mentioned association would be random.
|
||||||
|
|
||||||
|
This spec shall ensure that the NUMA parameters are available for the
|
||||||
|
deployer, in order to ensure PMDs uses the right logical CPUs for better
|
||||||
|
performance.
|
||||||
|
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
.. _Numa topology collector:
|
||||||
|
|
||||||
|
The collected data will be stored as a blob in Swift. Future work may
|
||||||
|
introduce an Inspector plug-in to further enhance the processing of the NUMA
|
||||||
|
architecture data. A new optional `Numa topology collector`_ shall be used to
|
||||||
|
fetch the below required information related to NUMA nodes.
|
||||||
|
|
||||||
|
* List of NUMA nodes - Shall be fetched from
|
||||||
|
``/sys/devices/system/node/node<numa_node_id>``
|
||||||
|
|
||||||
|
* List of CPU cores associated with each NUMA node - Shall be fetched from
|
||||||
|
``/sys/devices/system/node/node<numa_node_id>/cpu<thread_id>/topology/core_id``
|
||||||
|
|
||||||
|
* List of thread_siblings for each core - Shall be fetched from
|
||||||
|
``/sys/devices/system/node/node<numa_node_id>/cpu<thread_id>``
|
||||||
|
|
||||||
|
* NUMA Node ID for network interfaces - Extract the numa node for the NIC from
|
||||||
|
``/sys/class/net/<interface name>/device/numa_node``
|
||||||
|
|
||||||
|
* RAM available for each NUMA node - Shall be fetched from
|
||||||
|
``/sys/devices/system/node/node<numa_node_id>/meminfo``
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
Another option would be to allow the deployment with the default parameters
|
||||||
|
and then identify the actual values from the compute nodes. Then re-configure
|
||||||
|
the correct parameters and re-deploy. The proposed changes will provide the
|
||||||
|
deployer with the NUMA topology details during the introspection stage and
|
||||||
|
there by avoids the need for redeployment
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
The data structure for storing the information on NUMA nodes, CPUs, thread
|
||||||
|
siblings, ram and nics shall be::
|
||||||
|
|
||||||
|
{
|
||||||
|
"numa_topology": {
|
||||||
|
"ram": [{"numa_node": <numa_node_id>, "size_kb": <memory_in_kb>}, ...],
|
||||||
|
"cpus": [
|
||||||
|
{"cpu": <cpu_id>, "numa_node": <numa_node_id>, "thread_siblings": [<list of sibling threads>]},
|
||||||
|
...,
|
||||||
|
],
|
||||||
|
"nics": [
|
||||||
|
{"name": "<network interface name>", "numa_node": <numa_node_id>},
|
||||||
|
...,
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
Where:
|
||||||
|
* ``ram`` a mapping from memory available to a NUMA node
|
||||||
|
* ``cpus`` a mapping from physical CPU ID to a NUMA node and a list of
|
||||||
|
sibling threads
|
||||||
|
* ``nics`` a mapping from NIC names to NUMA node
|
||||||
|
|
||||||
|
|
||||||
|
Example::
|
||||||
|
|
||||||
|
{
|
||||||
|
"numa_topology": {
|
||||||
|
"ram": [
|
||||||
|
{"numa_node":0, "size_kb": 2097152},
|
||||||
|
{"numa_node":1, "size_kb": 1048576}
|
||||||
|
],
|
||||||
|
"cpus": [
|
||||||
|
{"cpu": 0, "numa_node": 0, "thread_siblings": [0,1]},
|
||||||
|
{"cpu": 1, "numa_node": 0, "thread_siblings": [2,3]},
|
||||||
|
...,
|
||||||
|
{"cpu": 0, "numa_node": 1, "thread_siblings": [16,17]},
|
||||||
|
{"cpu": 1, "numa_node": 1, "thread_siblings": [18,19]},
|
||||||
|
...,
|
||||||
|
|
||||||
|
],
|
||||||
|
"nics": [
|
||||||
|
{"name": "ixgbe0", "numa_node": 0},
|
||||||
|
{"name": "ixgbe1", "numa_node": 1}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
In ``cpus``, ``cpu`` and ``numa_node`` together forms a unique value, as
|
||||||
|
``cpu_id`` is specific to a NUMA node. And the thread id specified in
|
||||||
|
``thread_siblings`` will be unique across NUMA nodes.
|
||||||
|
|
||||||
|
HTTP API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Client (CLI) impact
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Ironic python agent impact
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
The changes proposed above will be implemented in IPA.
|
||||||
|
|
||||||
|
Performance and scalability impact
|
||||||
|
----------------------------------
|
||||||
|
|
||||||
|
None.
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Deployer impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
The deployer shall enable the optional `Numa topology collector`_ via
|
||||||
|
``ipa-inspection-collectors`` kernel argument. The deployer will be able to get
|
||||||
|
the information about memory per NUMA node, CPUs, thread siblings and nics,
|
||||||
|
which could be useful in configuring the system for better performance.
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
* karthiks
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
* Implement the collector to fetch the NUMA topology information in IPA
|
||||||
|
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
Unit test cases will be added.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. [#] http://dpdk.org/doc/guides-16.04/linux_gsg/nic_perf_intel_platform.html
|
||||||
|
.. [#] http://docs.openstack.org/admin-guide/compute-cpu-topologies.html
|
||||||
|
.. [#] https://en.wikipedia.org/wiki/Non-uniform_memory_access
|
||||||
|
.. [#] http://www.linuxsecrets.com/blog/6managing-linux-systems/2015/10/01/1658-how-to-identify-a-pci-slot-to-physical-socket-in-a-multi-processor-system-with-linux
|
||||||
|
.. [#] https://patchwork.kernel.org/patch/5142561/
|
Loading…
Reference in New Issue