Container pinning on worker nodes and All-in-one servers

This story will pin the infrastructure and openstack pods to the
platform cores for worker nodes and All-in-one servers.

This configures systemd system.conf parameter
CPUAffinity=<platform_cpus> by generating
/etc/systemd/system.conf.d/platform-cpuaffinity.conf .
All services launch tasks with the appropriate cpu affinity.

This creates the cgroup called 'k8s-infra' for the following subset
of controllers ('cpuacct', 'cpuset', 'cpu', 'memory', 'systemd').
This configures custom cpuset.cpus (i.e., cpuset) and cpuset.mems
(i.e., nodeset) based on sysinv platform configurable cores. This is
generated by puppet using sysinv host cpu information and is stored
to the hieradata variables:
- platform::kubernetes::params::k8s_cpuset
- platform::kubernetes::params::k8s_nodeset

This creates the cgroup called 'machine.slice' for the controller
'cpuset' and sets cpuset.cpus and cpuset.mems to the parent values.
This prevents VMs from inheriting those settings from libvirt.

Note: systemd automatically mounts cgroups and all available
resource controllers, so the new puppet code does not need to do
that.

Kubelet is now launched with --cgroup-root /k8s-infra by configuring
kubeadm.yaml with the option: cgroupRoot: "/k8s-infra" .

For openstack based worker nodes including AIO
(i.e., host-label openstack-compute-node=enabled):
- the k8s cpuset and nodeset include the assigned platform cores

For non-openstack based worker nodes including AIO:
- the k8s cpuset and nodeset include all cpus except the assigned
  platform cores. This will be refined in a later update since
  we need isolate cpusets of k8s infrastructure from other pods.

The cpuset topology can be viewed with the following:
 sudo systemd-cgls cpuset

The task cpu affinity can be verified with the following:
 ps-sched.sh

The dynamic affining of platform tasks during start-up is disabled,
that code requires cleanup, and likely no longer required
since we are using systemd CPUAffinity and cgroups.

This includes a few small fixes to enable testing of this feature:
- facter platform_res_mem was updated to not require 'memtop', since
  that depends on existance of numa nodes. This was failing on QEMU
  environment when the host does not have Numa nodes. This occurs
  when there is no CPU topology specified.
- cpumap_functions.sh updated parameter defaults so that calling
  bash scripts may enable 'set -u' undefined variable checking.
- the generation of platform_cpu_list did not have all threads.
- the cpulist-to-ranges inline code was incorrect; in certain
  senarios the rstrip(',') would take out the wrong commas.

Story: 2004762
Task: 28879

Change-Id: I6fd21bac59fc2d408132905b88710da48aa8d928
Signed-off-by: Jim Gauld <james.gauld@windriver.com>
This commit is contained in:
Jim Gauld
2019-03-28 14:26:24 -04:00
parent fc87a35164
commit a8a07c5690
5 changed files with 143 additions and 1 deletions

View File

@@ -15,6 +15,16 @@ class platform::compute::config
replace => true,
content => template('platform/worker_reserved.conf.erb')
}
file { '/etc/systemd/system.conf.d/platform-cpuaffinity.conf':
ensure => 'present',
replace => true,
content => template('platform/systemd-system-cpuaffinity.conf.erb')
}
}
class platform::compute::config::runtime {
include ::platform::compute::config
}
class platform::compute::grub::params (
@@ -307,6 +317,37 @@ class platform::compute::pmqos (
}
}
# Set systemd machine.slice cgroup cpuset to be used with VMs,
# and configure this cpuset to span all logical cpus and numa nodes.
# NOTES:
# - The parent directory cpuset spans all online cpus and numa nodes.
# - Setting the machine.slice cpuset prevents this from inheriting
# kubernetes libvirt pod's cpuset, since machine.slice cgroup will be
# created when a VM is launched if it does not already exist.
# - systemd automatically mounts cgroups and controllers, so don't need
# to do that here.
class platform::compute::machine {
$parent_dir = '/sys/fs/cgroup/cpuset'
$parent_mems = "${parent_dir}/cpuset.mems"
$parent_cpus = "${parent_dir}/cpuset.cpus"
$machine_dir = "${parent_dir}/machine.slice"
$machine_mems = "${machine_dir}/cpuset.mems"
$machine_cpus = "${machine_dir}/cpuset.cpus"
notice("Create ${machine_dir}")
file { $machine_dir :
ensure => directory,
owner => 'root',
group => 'root',
mode => '0700',
}
-> exec { "Create ${machine_mems}" :
command => "/bin/cat ${parent_mems} > ${machine_mems}",
}
-> exec { "Create ${machine_cpus}" :
command => "/bin/cat ${parent_cpus} > ${machine_cpus}",
}
}
class platform::compute {
Class[$name] -> Class['::platform::vswitch']
@@ -316,5 +357,6 @@ class platform::compute {
require ::platform::compute::allocate
require ::platform::compute::pmqos
require ::platform::compute::resctrl
require ::platform::compute::machine
require ::platform::compute::config
}