From da809c2100c50ecc918297a182f7d6d185bf1b8c Mon Sep 17 00:00:00 2001 From: karthik s Date: Thu, 10 Nov 2016 17:26:55 +0530 Subject: [PATCH] Spec for retrieving NUMA node information Proposing a spec for retrieving NUMA node information during introspection. Partial-Bug: #1635253 Co-Authored-By: Saravanan KR Change-Id: I99aa9f0462f45a1cccec72801fbd14d1395b6386 Signed-off-by: karthik s --- specs/NUMA_node_info.rst | 218 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 218 insertions(+) create mode 100644 specs/NUMA_node_info.rst diff --git a/specs/NUMA_node_info.rst b/specs/NUMA_node_info.rst new file mode 100644 index 0000000..ea7dc5d --- /dev/null +++ b/specs/NUMA_node_info.rst @@ -0,0 +1,218 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +=============================== + Retrieve NUMA node information +=============================== + +https://bugs.launchpad.net/ironic-python-agent/+bug/1635253 + +Today, The introspected data from the nodes does not provide information about +the NUMA topology to the deployer. The deployer needs information on the +associvity of NUMA nodes with the list of cores and NICs. These details would +be required for configuring the nodes with DPDK aware NICs. + +Problem description +=================== + +In order to configure the nodes for better performance, selection of CPUs +based on the NUMA topology becomes necessary. In case of nodes with DPDK aware +NICs, the CPUs for poll mode driver (PMD) needs to be selected from the +NUMA node associated with the DPDK NICs. If hyperthreading is enabled, then +selection of the logical cores requires the knowledge on the siblings. This +information shall be read from the Swift-stored data by the deployer and it +shall help the deployer to manually configure the deployment parameters for +better performance. + +For example, to list available NUMA-aware NICs:: + + $ lspci -nn | grep Eth + 82:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583] + 82:00.1 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583] + 85:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583] + 85:00.1 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583] + +To obtain the NUMA node ID from the PCI device through the sysfs:: + + $ cat /sys/bus/pci/devices/0000\:85\:00.1/numa_node + 1 + +To get the best performance we need to ensure that the CPU core and NIC are in +the same NUMA node. In the example above, the NIC with PCI address ``85:00.1`` +is in NUMA node ``1``. In order to achieve best performance the NIC should be +preferably used by the DPDK Poll Mode Drivers (PMD) running on the CPU cores +in NUMA node ``1``. If not, the best performance is not guranteed as the +above-mentioned association would be random. + +This spec shall ensure that the NUMA parameters are available for the +deployer, in order to ensure PMDs uses the right logical CPUs for better +performance. + + +Proposed change +=============== + + .. _Numa topology collector: + +The collected data will be stored as a blob in Swift. Future work may +introduce an Inspector plug-in to further enhance the processing of the NUMA +architecture data. A new optional `Numa topology collector`_ shall be used to +fetch the below required information related to NUMA nodes. + +* List of NUMA nodes - Shall be fetched from + ``/sys/devices/system/node/node`` + +* List of CPU cores associated with each NUMA node - Shall be fetched from + ``/sys/devices/system/node/node/cpu/topology/core_id`` + +* List of thread_siblings for each core - Shall be fetched from + ``/sys/devices/system/node/node/cpu`` + +* NUMA Node ID for network interfaces - Extract the numa node for the NIC from + ``/sys/class/net//device/numa_node`` + +* RAM available for each NUMA node - Shall be fetched from + ``/sys/devices/system/node/node/meminfo`` + +Alternatives +------------ + +Another option would be to allow the deployment with the default parameters +and then identify the actual values from the compute nodes. Then re-configure +the correct parameters and re-deploy. The proposed changes will provide the +deployer with the NUMA topology details during the introspection stage and +there by avoids the need for redeployment + +Data model impact +----------------- + +The data structure for storing the information on NUMA nodes, CPUs, thread +siblings, ram and nics shall be:: + + { + "numa_topology": { + "ram": [{"numa_node": , "size_kb": }, ...], + "cpus": [ + {"cpu": , "numa_node": , "thread_siblings": []}, + ..., + ], + "nics": [ + {"name": "", "numa_node": }, + ..., + ] + } + } + } + + +Where: + * ``ram`` a mapping from memory available to a NUMA node + * ``cpus`` a mapping from physical CPU ID to a NUMA node and a list of + sibling threads + * ``nics`` a mapping from NIC names to NUMA node + + +Example:: + + { + "numa_topology": { + "ram": [ + {"numa_node":0, "size_kb": 2097152}, + {"numa_node":1, "size_kb": 1048576} + ], + "cpus": [ + {"cpu": 0, "numa_node": 0, "thread_siblings": [0,1]}, + {"cpu": 1, "numa_node": 0, "thread_siblings": [2,3]}, + ..., + {"cpu": 0, "numa_node": 1, "thread_siblings": [16,17]}, + {"cpu": 1, "numa_node": 1, "thread_siblings": [18,19]}, + ..., + + ], + "nics": [ + {"name": "ixgbe0", "numa_node": 0}, + {"name": "ixgbe1", "numa_node": 1} + ] + } + } + } + +.. note:: + In ``cpus``, ``cpu`` and ``numa_node`` together forms a unique value, as + ``cpu_id`` is specific to a NUMA node. And the thread id specified in + ``thread_siblings`` will be unique across NUMA nodes. + +HTTP API impact +--------------- + +None + +Client (CLI) impact +------------------- + +None + +Ironic python agent impact +-------------------------- + +The changes proposed above will be implemented in IPA. + +Performance and scalability impact +---------------------------------- + +None. + +Security impact +--------------- + +None + +Deployer impact +--------------- + +The deployer shall enable the optional `Numa topology collector`_ via +``ipa-inspection-collectors`` kernel argument. The deployer will be able to get +the information about memory per NUMA node, CPUs, thread siblings and nics, +which could be useful in configuring the system for better performance. + +Developer impact +---------------- + +None + + +Implementation +============== + +Assignee(s) +----------- + +* karthiks + +Work Items +---------- + +* Implement the collector to fetch the NUMA topology information in IPA + + +Dependencies +============ + +None + +Testing +======= + +Unit test cases will be added. + +References +========== + +.. [#] http://dpdk.org/doc/guides-16.04/linux_gsg/nic_perf_intel_platform.html +.. [#] http://docs.openstack.org/admin-guide/compute-cpu-topologies.html +.. [#] https://en.wikipedia.org/wiki/Non-uniform_memory_access +.. [#] http://www.linuxsecrets.com/blog/6managing-linux-systems/2015/10/01/1658-how-to-identify-a-pci-slot-to-physical-socket-in-a-multi-processor-system-with-linux +.. [#] https://patchwork.kernel.org/patch/5142561/