Browse Source

Merge "Spec for retrieving NUMA node information"

Jenkins 2 years ago
parent
commit
e77b6dfc97
1 changed files with 218 additions and 0 deletions
  1. 218
    0
      specs/NUMA_node_info.rst

+ 218
- 0
specs/NUMA_node_info.rst View File

@@ -0,0 +1,218 @@
1
+..
2
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
3
+ License.
4
+
5
+ http://creativecommons.org/licenses/by/3.0/legalcode
6
+
7
+===============================
8
+ Retrieve NUMA node information
9
+===============================
10
+
11
+https://bugs.launchpad.net/ironic-python-agent/+bug/1635253
12
+
13
+Today, The introspected data from the nodes does not provide information about
14
+the NUMA topology to the deployer. The deployer needs information on the
15
+associvity of NUMA nodes with the list of cores and NICs. These details would
16
+be required for configuring the nodes with DPDK aware NICs.
17
+
18
+Problem description
19
+===================
20
+
21
+In order to configure the nodes for better performance, selection of CPUs
22
+based on the NUMA topology becomes necessary. In case of nodes with DPDK aware
23
+NICs, the CPUs for poll mode driver (PMD) needs to be selected from the
24
+NUMA node associated with the DPDK NICs. If hyperthreading is enabled, then
25
+selection of the logical cores requires the knowledge on the siblings. This
26
+information shall be read from the Swift-stored data by the deployer and it
27
+shall help the deployer to manually configure the deployment parameters for
28
+better performance.
29
+
30
+For example, to list available NUMA-aware NICs::
31
+
32
+  $ lspci -nn | grep Eth
33
+  82:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
34
+  82:00.1 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
35
+  85:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
36
+  85:00.1 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
37
+
38
+To obtain the NUMA node ID from the PCI device through the sysfs::
39
+
40
+  $ cat /sys/bus/pci/devices/0000\:85\:00.1/numa_node
41
+  1
42
+
43
+To get the best performance we need to ensure that the CPU core and NIC are in
44
+the same NUMA node. In the example above, the NIC with PCI address ``85:00.1``
45
+is in NUMA node ``1``. In order to achieve best performance the NIC should be
46
+preferably used by the DPDK Poll Mode Drivers (PMD) running on the CPU cores
47
+in NUMA node ``1``. If not, the best performance is not guranteed as the
48
+above-mentioned association would be random.
49
+
50
+This spec shall ensure that the NUMA parameters are available for the
51
+deployer, in order to ensure PMDs uses the right logical CPUs for better
52
+performance.
53
+
54
+
55
+Proposed change
56
+===============
57
+
58
+  .. _Numa topology collector:
59
+
60
+The collected data will be stored as a blob in Swift. Future work may
61
+introduce an Inspector plug-in to further enhance the processing of the NUMA
62
+architecture data. A new optional `Numa topology collector`_ shall be used to
63
+fetch the below required information related to NUMA nodes.
64
+
65
+* List of NUMA nodes - Shall be fetched from
66
+  ``/sys/devices/system/node/node<numa_node_id>``
67
+
68
+* List of CPU cores associated with each NUMA node -  Shall be fetched from
69
+  ``/sys/devices/system/node/node<numa_node_id>/cpu<thread_id>/topology/core_id``
70
+
71
+* List of thread_siblings for each core - Shall be fetched from
72
+  ``/sys/devices/system/node/node<numa_node_id>/cpu<thread_id>``
73
+
74
+* NUMA Node ID for network interfaces - Extract the numa node for the NIC from
75
+  ``/sys/class/net/<interface name>/device/numa_node``
76
+
77
+* RAM available for each NUMA node - Shall be fetched from
78
+  ``/sys/devices/system/node/node<numa_node_id>/meminfo``
79
+
80
+Alternatives
81
+------------
82
+
83
+Another option would be to allow the deployment with the default parameters
84
+and then identify the actual values from the compute nodes. Then re-configure
85
+the correct parameters and re-deploy. The proposed changes will provide the
86
+deployer with the NUMA topology details during the introspection stage and
87
+there by avoids the need for redeployment
88
+
89
+Data model impact
90
+-----------------
91
+
92
+The data structure for storing the information on NUMA nodes, CPUs, thread
93
+siblings, ram and nics shall be::
94
+
95
+  {
96
+    "numa_topology": {
97
+        "ram": [{"numa_node": <numa_node_id>, "size_kb": <memory_in_kb>}, ...],
98
+        "cpus": [
99
+          {"cpu": <cpu_id>, "numa_node": <numa_node_id>, "thread_siblings": [<list of sibling threads>]},
100
+          ...,
101
+        ],
102
+        "nics": [
103
+          {"name": "<network interface name>", "numa_node": <numa_node_id>},
104
+          ...,
105
+        ]
106
+      }
107
+    }
108
+  }
109
+
110
+
111
+Where:
112
+  * ``ram`` a mapping from memory available to a NUMA node
113
+  * ``cpus`` a mapping from physical CPU ID to a NUMA node and a list of
114
+    sibling threads
115
+  * ``nics`` a mapping from NIC names to NUMA node
116
+
117
+
118
+Example::
119
+
120
+  {
121
+    "numa_topology": {
122
+        "ram": [
123
+          {"numa_node":0, "size_kb": 2097152},
124
+          {"numa_node":1, "size_kb": 1048576}
125
+        ],
126
+        "cpus": [
127
+          {"cpu": 0, "numa_node": 0, "thread_siblings": [0,1]},
128
+          {"cpu": 1, "numa_node": 0, "thread_siblings": [2,3]},
129
+          ...,
130
+          {"cpu": 0, "numa_node": 1, "thread_siblings": [16,17]},
131
+          {"cpu": 1, "numa_node": 1, "thread_siblings": [18,19]},
132
+          ...,
133
+
134
+        ],
135
+        "nics": [
136
+          {"name": "ixgbe0", "numa_node": 0},
137
+          {"name": "ixgbe1", "numa_node": 1}
138
+        ]
139
+      }
140
+    }
141
+  }
142
+
143
+.. note::
144
+    In ``cpus``, ``cpu`` and ``numa_node`` together forms a unique value, as
145
+    ``cpu_id`` is specific to a NUMA node. And the thread id specified in
146
+    ``thread_siblings`` will be unique across NUMA nodes.
147
+
148
+HTTP API impact
149
+---------------
150
+
151
+None
152
+
153
+Client (CLI) impact
154
+-------------------
155
+
156
+None
157
+
158
+Ironic python agent impact
159
+--------------------------
160
+
161
+The changes proposed above will be implemented in IPA.
162
+
163
+Performance and scalability impact
164
+----------------------------------
165
+
166
+None.
167
+
168
+Security impact
169
+---------------
170
+
171
+None
172
+
173
+Deployer impact
174
+---------------
175
+
176
+The deployer shall enable the optional `Numa topology collector`_ via
177
+``ipa-inspection-collectors`` kernel argument. The deployer will be able to get
178
+the information about memory per NUMA node, CPUs, thread siblings and nics,
179
+which could be useful in configuring the system for better performance.
180
+
181
+Developer impact
182
+----------------
183
+
184
+None
185
+
186
+
187
+Implementation
188
+==============
189
+
190
+Assignee(s)
191
+-----------
192
+
193
+* karthiks
194
+
195
+Work Items
196
+----------
197
+
198
+* Implement the collector to fetch the NUMA topology information in IPA
199
+
200
+
201
+Dependencies
202
+============
203
+
204
+None
205
+
206
+Testing
207
+=======
208
+
209
+Unit test cases will be added.
210
+
211
+References
212
+==========
213
+
214
+.. [#] http://dpdk.org/doc/guides-16.04/linux_gsg/nic_perf_intel_platform.html
215
+.. [#] http://docs.openstack.org/admin-guide/compute-cpu-topologies.html
216
+.. [#] https://en.wikipedia.org/wiki/Non-uniform_memory_access
217
+.. [#] http://www.linuxsecrets.com/blog/6managing-linux-systems/2015/10/01/1658-how-to-identify-a-pci-slot-to-physical-socket-in-a-multi-processor-system-with-linux
218
+.. [#] https://patchwork.kernel.org/patch/5142561/

Loading…
Cancel
Save