kernel-std: add initial version for debian packaging

Add kernel 5.10.74 debian packaging.

The kernel we are building starts as source code from the Yocto Project
kernel found at
(https://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto/about/?h=v5.10/standard/base).
To facilitate the creation of a Debian package of this kernel we start
by making a copy of the 5.10 Debian Bullseye 'debian' folder taken from
(http://snapshot.debian.org/package/linux/5.10.28-1/) and apply
customization via the meta-data patches in debian/deb_patches dir. In
this way we can review and incorporate changes the Debian community
makes to their kernel's 'debian' folder over time.

Since there are StarlingX specific patches to the kernel not suitable to
send for merging in linux-yocto we apply these here as defined in scope
and order in the contained debian/patches/series file.

Verification:
As we are only getting the Debian work bootstrapped there is quite a few
restrictions as far as what can be tested.

- I have compared it to the kernel 5.10.74 being used with stx centos:
  - the linux-yocto source code is same;
  - all the StarlingX specific patches are same;
  - the .config of Starlingx centos kernel 5.10.74 is taken to Starlingx
    debian, coexists and overrides the default debian kenrel configs,
    and only below changes are done for it:
    - remove some CONFIGs not set by Starlingx centos kernel code
      intentionally, such as CONFIG_CC_CAN_LINK;
    - remove some CONFIGs special for Starlingx centos kernel code such
      as: CONFIG_CC_VERSION_TEXT;
    - keep the CONFIGs related with signature aligned with debian
      release, because the security feature is still in development.
- 28 debs are built successfully. Build kernel image into rootfs and
  initramfs. Build the LAT ustart image from them.
- Use qemu to boot the ustart image, and the installer installs the
  rootfs successfully. The final debian system with this new kernel
  boot up successfully and run some simple commands successfully.

Story: 2009221
Task: 43290
Signed-off-by: Li Zhou <li.zhou@windriver.com>
Change-Id: I2f98fcc3f929e3e006d30210d559913a10a77ac2
This commit is contained in:
Li Zhou
2021-09-29 05:47:00 -04:00
parent 62e6815691
commit 92efe6f666
19 changed files with 22533 additions and 0 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,31 @@
From 9576d9ffaaa5af945ae375ebbd8b855a1165ff3b Mon Sep 17 00:00:00 2001
From: Li Zhou <li.zhou@windriver.com>
Date: Tue, 23 Nov 2021 10:53:44 +0800
Subject: [PATCH 2/4] kernel-std: Add a new changelog file for linux-yocto
source code
Don't list the commits logs and only give out the link for checking
about the git history of this source code.
Signed-off-by: Li Zhou <li.zhou@windriver.com>
---
debian/changelog | 7 +++++++
1 file changed, 7 insertions(+)
create mode 100644 debian/changelog
diff --git a/debian/changelog b/debian/changelog
new file mode 100644
index 0000000..c64e51a
--- /dev/null
+++ b/debian/changelog
@@ -0,0 +1,7 @@
+linux (5.10.74-1) stable; urgency=medium
+
+ * New upstream stable update:
+ https://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto/log/?h=v5.10%2Fstandard%2Fbase&qt=range&q=9e84a42af61ff9c6feb89ab8d61ee5f25fb35c72
+
+ -- Li Zhou <li.zhou@windriver.com> Tue Nov 9 11:23:35 CST 2021
+
--
2.17.1

View File

@@ -0,0 +1,275 @@
From 73e38923ca6bad9469141c5fd0279dc4f5a7ef13 Mon Sep 17 00:00:00 2001
From: Li Zhou <li.zhou@windriver.com>
Date: Thu, 11 Nov 2021 17:24:01 +0800
Subject: [PATCH 4/4] kernel-std: Adapt the debian folder for building
linux-yocto source
Below are the changes on DEBIAN's kernel release's "debian" folder
for building linux-yocto kernel source 5.10.74, besides kernel configs
and the debian/changelog file:
-Remove the checking about CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT which
is only defined in DEBIAN patches;
-Update debian/config/amd64/none/defines to disable debian cloud image
building, which isn't in use here;
-Update debian/config/defines to disable rt/docs/installer packages'
building to make kernel-rt independent and avoid some building errors
caused by docs and installer, which aren't in use here;
-Update debian/lib/python/debian_linux/debian.py to remove an
unimportant format check for changelog because it conflicts with the
new building system;
-Update debian/patches/series to only keep the patches to support
package building system, which are from DEBIAN release.
-Update debian/rules to solve the issue that building paused after
gencontrol.
Signed-off-by: Li Zhou <li.zhou@windriver.com>
---
debian/bin/gencontrol_signed.py | 1 -
debian/config/amd64/none/defines | 7 --
debian/config/defines | 7 +-
debian/lib/python/debian_linux/debian.py | 23 +++--
debian/patches/series | 116 -----------------------
debian/rules | 9 +-
6 files changed, 18 insertions(+), 145 deletions(-)
diff --git a/debian/bin/gencontrol_signed.py b/debian/bin/gencontrol_signed.py
index 75d9112..b984cf3 100755
--- a/debian/bin/gencontrol_signed.py
+++ b/debian/bin/gencontrol_signed.py
@@ -188,7 +188,6 @@ class Gencontrol(Base):
(image_package_name, image_suffix)) as f:
kconfig = f.readlines()
assert 'CONFIG_EFI_STUB=y\n' in kconfig
- assert 'CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT=y\n' in kconfig
cert_re = re.compile(r'CONFIG_SYSTEM_TRUSTED_KEYS="(.*)"$')
cert_file_name = None
for line in kconfig:
diff --git a/debian/config/amd64/none/defines b/debian/config/amd64/none/defines
index ada2355..090dc41 100644
--- a/debian/config/amd64/none/defines
+++ b/debian/config/amd64/none/defines
@@ -1,10 +1,3 @@
[base]
flavours:
amd64
- cloud-amd64
-default-flavour: amd64
-
-[cloud-amd64_image]
-configs:
- config.cloud
- amd64/config.cloud-amd64
diff --git a/debian/config/defines b/debian/config/defines
index 7133cd7..018a1b2 100644
--- a/debian/config/defines
+++ b/debian/config/defines
@@ -141,7 +141,6 @@ arches:
compiler: gcc-10
featuresets:
none
- rt
[build]
debug-info: true
@@ -149,7 +148,7 @@ debug-info: true
signed-code: false
[featureset-rt_base]
-enabled: true
+enabled: false
[description]
part-long-up: This kernel is not suitable for SMP (multi-processor,
@@ -167,3 +166,7 @@ gcc-10: gcc-10 <!stage1 !cross !pkg.linux.nokernel>, gcc-10-@gnu-type-package@ <
# initramfs-generators
initramfs-fallback: linux-initramfs-tool
initramfs-tools: initramfs-tools (>= 0.120+deb8u2)
+
+[packages]
+docs: false
+installer: false
diff --git a/debian/lib/python/debian_linux/debian.py b/debian/lib/python/debian_linux/debian.py
index 6fb2618..2e0fed9 100644
--- a/debian/lib/python/debian_linux/debian.py
+++ b/debian/lib/python/debian_linux/debian.py
@@ -85,18 +85,17 @@ class Changelog(list):
v = Version(top_match.group('version'))
else:
bottom_match = self._bottom_re.match(line)
- if not bottom_match:
- raise Exception('invalid bottom line %d in changelog' %
- line_no)
-
- self.append(self.Entry(
- distribution=top_match.group('distribution'),
- source=top_match.group('source'),
- version=v,
- urgency=top_match.group('urgency'),
- maintainer=bottom_match.group('maintainer'),
- date=bottom_match.group('date')))
- top_match = bottom_match = None
+ #Don't raise exception any more if this bottom format
+ #checking fail because we have adpated the changelog format.
+ if bottom_match:
+ self.append(self.Entry(
+ distribution=top_match.group('distribution'),
+ source=top_match.group('source'),
+ version=v,
+ urgency=top_match.group('urgency'),
+ maintainer=bottom_match.group('maintainer'),
+ date=bottom_match.group('date')))
+ top_match = bottom_match = None
class Version(object):
diff --git a/debian/patches/series b/debian/patches/series
index 56d1700..bdb60da 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1,13 +1,3 @@
-debian/gitignore.patch
-
-# Disable features broken by exclusion of upstream files
-debian/dfsg/arch-powerpc-platforms-8xx-ucode-disable.patch
-debian/dfsg/drivers-media-dvb-dvb-usb-af9005-disable.patch
-debian/dfsg/vs6624-disable.patch
-debian/dfsg/drivers-net-appletalk-cops.patch
-debian/dfsg/video-remove-nvidiafb-and-rivafb.patch
-debian/dfsg/documentation-fix-broken-link-to-cipso-draft.patch
-
# Changes to support package build system
debian/version.patch
debian/uname-version-timestamp.patch
@@ -25,109 +15,3 @@ debian/android-enable-building-ashmem-and-binder-as-modules.patch
debian/documentation-drop-sphinx-version-check.patch
debian/perf-traceevent-support-asciidoctor-for-documentatio.patch
debian/kbuild-look-for-module.lds-under-arch-directory-too.patch
-
-# Fixes/improvements to firmware loading
-features/all/drivers-media-dvb-usb-af9005-request_firmware.patch
-debian/iwlwifi-do-not-request-unreleased-firmware.patch
-bugfix/all/firmware_class-log-every-success-and-failure.patch
-bugfix/all/firmware-remove-redundant-log-messages-from-drivers.patch
-bugfix/all/radeon-amdgpu-firmware-is-required-for-drm-and-kms-on-r600-onward.patch
-debian/firmware_class-refer-to-debian-wiki-firmware-page.patch
-
-# Patches from aufs5 repository, imported with debian/bin/genpatch-aufs.
-# These are only the changes needed to allow aufs to be built out-of-tree.
-#features/all/aufs5/aufs5-base.patch
-#features/all/aufs5/aufs5-mmap.patch
-#features/all/aufs5/aufs5-standalone.patch
-
-# Change some defaults for security reasons
-debian/af_802154-Disable-auto-loading-as-mitigation-against.patch
-debian/rds-Disable-auto-loading-as-mitigation-against-local.patch
-debian/decnet-Disable-auto-loading-as-mitigation-against-lo.patch
-debian/dccp-disable-auto-loading-as-mitigation-against-local-exploits.patch
-debian/hamradio-disable-auto-loading-as-mitigation-against-local-exploits.patch
-debian/fs-enable-link-security-restrictions-by-default.patch
-
-# Set various features runtime-disabled by default
-debian/sched-autogroup-disabled.patch
-debian/yama-disable-by-default.patch
-debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch
-features/all/security-perf-allow-further-restriction-of-perf_event_open.patch
-features/x86/intel-iommu-add-option-to-exclude-integrated-gpu-only.patch
-features/x86/intel-iommu-add-kconfig-option-to-exclude-igpu-by-default.patch
-
-# Disable autoloading/probing of various drivers by default
-debian/cdc_ncm-cdc_mbim-use-ncm-by-default.patch
-debian/snd-pcsp-disable-autoload.patch
-bugfix/x86/viafb-autoload-on-olpc-xo1.5-only.patch
-debian/fjes-disable-autoload.patch
-
-# Taint if dangerous features are used
-debian/fanotify-taint-on-use-of-fanotify_access_permissions.patch
-debian/btrfs-warn-about-raid5-6-being-experimental-at-mount.patch
-
-# Arch bug fixes
-bugfix/arm/arm-dts-kirkwood-fix-sata-pinmux-ing-for-ts419.patch
-bugfix/arm64/dts-rockchip-correct-voltage-selector-firefly-RK3399.patch
-bugfix/x86/perf-tools-fix-unwind-build-on-i386.patch
-bugfix/sh/sh-boot-do-not-use-hyphen-in-exported-variable-name.patch
-bugfix/arm/arm-mm-export-__sync_icache_dcache-for-xen-privcmd.patch
-bugfix/powerpc/powerpc-boot-fix-missing-crc32poly.h-when-building-with-kernel_xz.patch
-bugfix/arm64/arm64-acpi-Add-fixup-for-HPE-m400-quirks.patch
-bugfix/x86/x86-32-disable-3dnow-in-generic-config.patch
-
-# Arch features
-features/arm64/arm64-dts-rockchip-Add-basic-support-for-Kobol-s-Hel.patch
-features/x86/x86-memtest-WARN-if-bad-RAM-found.patch
-features/x86/x86-make-x32-syscall-support-conditional.patch
-
-# Miscellaneous bug fixes
-bugfix/all/disable-some-marvell-phys.patch
-bugfix/all/fs-add-module_softdep-declarations-for-hard-coded-cr.patch
-bugfix/all/partially-revert-usb-kconfig-using-select-for-usb_co.patch
-debian/makefile-do-not-check-for-libelf-when-building-oot-module.patch
-bugfix/all/partially-revert-net-socket-implement-64-bit-timestamps.patch
-
-# Miscellaneous features
-
-# Lockdown missing pieces
-features/all/lockdown/efi-add-an-efi_secure_boot-flag-to-indicate-secure-b.patch
-features/all/lockdown/efi-lock-down-the-kernel-if-booted-in-secure-boot-mo.patch
-features/all/lockdown/mtd-disable-slram-and-phram-when-locked-down.patch
-features/all/lockdown/arm64-add-kernel-config-option-to-lock-down-when.patch
-
-# Improve integrity platform keyring for kernel modules verification
-features/all/db-mok-keyring/0001-MODSIGN-do-not-load-mok-when-secure-boot-disabled.patch
-features/all/db-mok-keyring/0002-MODSIGN-load-blacklist-from-MOKx.patch
-features/all/db-mok-keyring/0003-MODSIGN-checking-the-blacklisted-hash-before-loading-a-kernel-module.patch
-features/all/db-mok-keyring/0004-MODSIGN-check-the-attributes-of-db-and-mok.patch
-features/all/db-mok-keyring/modsign-make-shash-allocation-failure-fatal.patch
-features/all/db-mok-keyring/KEYS-Make-use-of-platform-keyring-for-module-signature.patch
-
-# Security fixes
-debian/i386-686-pae-pci-set-pci-nobios-by-default.patch
-debian/ntfs-mark-it-as-broken.patch
-bugfix/x86/0001-bpf-x86-Validate-computation-of-branch-displacements.patch
-bugfix/x86/0002-bpf-x86-Validate-computation-of-branch-displacements.patch
-
-# Fix exported symbol versions
-bugfix/all/module-disable-matching-missing-version-crc.patch
-
-# Tools bug fixes
-bugfix/all/usbip-document-tcp-wrappers.patch
-bugfix/all/kbuild-fix-recordmcount-dependency.patch
-bugfix/all/tools-perf-man-date.patch
-bugfix/all/tools-perf-remove-shebangs.patch
-bugfix/x86/revert-perf-build-fix-libunwind-feature-detection-on.patch
-bugfix/all/tools-build-remove-bpf-run-time-check-at-build-time.patch
-bugfix/all/cpupower-bump-soname-version.patch
-bugfix/all/libcpupower-hide-private-function.patch
-bugfix/all/cpupower-fix-checks-for-cpu-existence.patch
-bugfix/all/tools-perf-pmu-events-fix-reproducibility.patch
-bugfix/all/bpftool-fix-version-string-in-recursive-builds.patch
-bugfix/all/tools-include-uapi-fix-errno.h.patch
-
-# overlay: allow mounting in user namespaces
-debian/overlayfs-permit-mounts-in-userns.patch
-
-# ABI maintenance
diff --git a/debian/rules b/debian/rules
index 3659e5b..db4fc10 100755
--- a/debian/rules
+++ b/debian/rules
@@ -113,12 +113,7 @@ debian/control-real: debian/bin/gencontrol.py $(CONTROL_FILES)
# Hash randomisation makes the pickled config unreproducible
PYTHONHASHSEED=0 $<
md5sum $^ > debian/control.md5sum
- @echo
- @echo This target is made to fail intentionally, to make sure
- @echo that it is NEVER run during the automated build. Please
- @echo ignore the following error, the debian/control file has
- @echo been generated SUCCESSFULLY.
- @echo
- exit 1
+ @echo The debian/control file has been generated SUCCESSFULLY.
+ @echo Go on with the building!
.PHONY: binary binary-% build build-% clean debian/control-real orig setup source
--
2.17.1

View File

@@ -0,0 +1,4 @@
0001-kernel-std-Remove-the-old-changelog-file.patch
0002-kernel-std-Add-a-new-changelog-file-for-linux-yocto-.patch
0003-kernel-std-Add-a-kernel-config-file-for-stx-debian.patch
0004-kernel-std-Adapt-the-debian-folder-for-building-linu.patch

59
kernel-std/debian/dl_hook Executable file
View File

@@ -0,0 +1,59 @@
#!/bin/bash
# The only parameter is the name of the folder where the source code
# is extracted to. Pay attention to that the extracted package should
# be put at the same path where this script is located.
# Tools needed: wget/tar/md5sum
KERNEL_HEAD_COMMIT=9e84a42af61ff9c6feb89ab8d61ee5f25fb35c72
KERNEL_MD5SUM=58ed543af9eb3767a7c576a603c5eb94
DEBIAN_DL_PATH=http://snapshot.debian.org/archive/debian/20210410T143728Z/pool/main/l/linux/
DEBIAN_FILE=linux_5.10.28-1.debian.tar.xz
DEBIAN_MD5SUM=de03a25a5c7e12ec1e559b6b0ce8889f
if [ ! -f "./linux-yocto-${KERNEL_HEAD_COMMIT}.tar.gz" ]
then
wget https://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto/snapshot/linux-yocto-${KERNEL_HEAD_COMMIT}.tar.gz
if [ $? -ne 0 ]
then
echo "wget failed: linux-yocto source!"
exit 1
fi
fi
md5sum ./linux-yocto-${KERNEL_HEAD_COMMIT}.tar.gz | grep ${KERNEL_MD5SUM}
if [ $? -ne 0 ]
then
echo "Wrong md5sum: linux-yocto source!"
exit 1
fi
tar xvf linux-yocto-${KERNEL_HEAD_COMMIT}.tar.gz
if [ $? -ne 0 ]
then
echo "tar failed: linux-yocto source!"
exit 1
fi
mv linux-yocto-${KERNEL_HEAD_COMMIT} $1
cd $1
wget ${DEBIAN_DL_PATH}${DEBIAN_FILE}
if [ $? -ne 0 ]
then
echo "wget failed: debian folder for kernel!"
exit 1
fi
md5sum ./${DEBIAN_FILE} | grep ${DEBIAN_MD5SUM}
if [ $? -ne 0 ]
then
echo "Wrong md5sum: debian folder for kernel!"
exit 1
fi
tar xvf ${DEBIAN_FILE}
if [ $? -ne 0 ]
then
echo "tar failed: debian folder for kernel!"
exit 1
fi
rm ${DEBIAN_FILE}

View File

@@ -0,0 +1,7 @@
---
debver: 5.10.74
debname: linux
dl_hook: dl_hook
revision:
dist: $STX_DIST
PKG_GITREVCOUNT: true

View File

@@ -0,0 +1,545 @@
From a6d19757680f81e529c789e77bcac01f9b9e2fbd Mon Sep 17 00:00:00 2001
From: Chris Friesen <chris.friesen@windriver.com>
Date: Thu, 17 Jun 2021 07:44:04 +0000
Subject: [PATCH] Notification of death of arbitrary processes
Note: this commit was copied from Titanium Cloud Rel2
This exposes a new feature which may be called to request
notification when an arbitrary process changes state. The
caller specifies a pid, signal number, and event mask, and
when that pid dies, or is stopped, or anything else that
would normally cause a SIGCHLD, the kernel will send the
specified signal to the caller if the event is in the event
mask originally passed down. The siginfo_t struct will
contain the same information as would be included with SIGCHLD.
This is exposed to userspace via the prctl() call with the
PR_DO_NOTIFY_TASK_STATE option.
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Zhang Zhiguo <zhangzhg@neusoft.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
[jm: Adapted the patch for context changes.]
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
---
include/linux/init_task.h | 9 ++
include/linux/sched.h | 6 +
include/uapi/linux/prctl.h | 16 +++
init/Kconfig | 15 +++
init/init_task.c | 1 +
kernel/Makefile | 1 +
kernel/death_notify.c | 228 +++++++++++++++++++++++++++++++++++++
kernel/death_notify.h | 46 ++++++++
kernel/exit.c | 6 +
kernel/fork.c | 4 +
kernel/signal.c | 11 ++
kernel/sys.c | 8 ++
12 files changed, 351 insertions(+)
create mode 100644 kernel/death_notify.c
create mode 100644 kernel/death_notify.h
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index b2412b4d4c20..7a0828daf59c 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -25,6 +25,15 @@
extern struct files_struct init_files;
extern struct fs_struct init_fs;
extern struct nsproxy init_nsproxy;
+
+#ifdef CONFIG_SIGEXIT
+#define INIT_SIGEXIT(tsk) \
+ .notify = LIST_HEAD_INIT(tsk.notify), \
+ .monitor = LIST_HEAD_INIT(tsk.monitor),
+#else
+#define INIT_SIGEXIT(tsk)
+#endif
+
extern struct group_info init_groups;
extern struct cred init_cred;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 76cd21fa5501..614cd20935a7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1119,6 +1119,12 @@ struct task_struct {
short il_prev;
short pref_node_fork;
#endif
+#ifdef CONFIG_SIGEXIT
+ /* list of processes to notify on death */
+ struct list_head notify;
+ /* list of outstanding monitor requests */
+ struct list_head monitor;
+#endif
#ifdef CONFIG_NUMA_BALANCING
int numa_scan_seq;
unsigned int numa_scan_period;
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 7f0827705c9a..dbd5a8b6e002 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -63,6 +63,22 @@
# define PR_ENDIAN_LITTLE 1 /* True little endian mode */
# define PR_ENDIAN_PPC_LITTLE 2 /* "PowerPC" pseudo little endian */
+#define PR_DO_NOTIFY_TASK_STATE 17 /* Set/get notification for task
+ state changes */
+
+/* This is the data structure for requestion process death
+ * (and other state change) information. Sig of -1 means
+ * query, sig of 0 means deregistration, positive sig means
+ * that you want to set it. sig and events are value-result
+ * and will be updated with the previous values on every
+ * successful call.
+ */
+struct task_state_notify_info {
+ int pid;
+ int sig;
+ unsigned int events;
+};
+
/* Get/set process seccomp mode */
#define PR_GET_SECCOMP 21
#define PR_SET_SECCOMP 22
diff --git a/init/Kconfig b/init/Kconfig
index fc4c9f416fad..df9d5284f8d6 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1851,6 +1851,21 @@ config VM_EVENT_COUNTERS
on EXPERT systems. /proc/vmstat will only show page counts
if VM event counters are disabled.
+config SIGEXIT
+ bool "Notification of death of arbitrary processes"
+ default n
+ help
+ When enabled this exposes a new feature which may be called to request
+ notification when an arbitrary process changes state. The caller specifies
+ a pid, signal number, and event mask, and when that pid dies, or is
+ stopped, or anything else that would normally cause a SIGCHLD, the
+ kernel will send the specified signal to the caller if the event is in
+ the event mask originally passed down. The siginfo_t struct will
+ contain the same information as would be included with SIGCHLD.
+
+ This is exposed to userspace via the prctl()
+ call with the PR_DO_NOTIFY_TASK_STATE option
+
config SLUB_DEBUG
default y
bool "Enable SLUB debugging support" if EXPERT
diff --git a/init/init_task.c b/init/init_task.c
index 16d14c2ebb55..eaee56e60985 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -128,6 +128,7 @@ struct task_struct init_task
.alloc_lock = __SPIN_LOCK_UNLOCKED(init_task.alloc_lock),
.journal_info = NULL,
INIT_CPU_TIMERS(init_task)
+ INIT_SIGEXIT(init_task)
.pi_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
.timer_slack_ns = 50000, /* 50 usec default slack */
.thread_pid = &init_struct_pid,
diff --git a/kernel/Makefile b/kernel/Makefile
index 88b60a6e5dd0..36377f54555a 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_BPF) += bpf/
obj-$(CONFIG_KCSAN) += kcsan/
obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_SIGEXIT) += death_notify.o
obj-$(CONFIG_PERF_EVENTS) += events/
diff --git a/kernel/death_notify.c b/kernel/death_notify.c
new file mode 100644
index 000000000000..5819d35a2564
--- /dev/null
+++ b/kernel/death_notify.c
@@ -0,0 +1,228 @@
+/*
+ * kernel/death_notify.c, Process death notification support
+ *
+ * Copyright (c) 2006-2014 Wind River Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+
+#include <linux/errno.h>
+#include <linux/signal.h>
+#include <linux/sched.h>
+#include <linux/sched/task.h>
+#include <linux/slab.h>
+#include <linux/prctl.h>
+#include <linux/uaccess.h>
+#include "death_notify.h"
+
+static void unlink_status_notifier(struct signotifier *n)
+{
+ list_del(&n->monitor_list);
+ list_del(&n->notify_list);
+ kfree(n);
+}
+
+static void handle_already_monitoring(struct signotifier *node,
+ struct task_state_notify_info *args,
+ struct task_state_notify_info *oldargs)
+{
+ /* Store the old values */
+ oldargs->sig = node->sig;
+ oldargs->events = node->events;
+
+ /* We know that args->sig is 0 or a valid signal. */
+ if (args->sig > 0) {
+ /* Update the new values */
+ node->sig = args->sig;
+ node->events = args->events;
+ } else if (!args->sig) {
+ /* args->sig of 0 means to deregister */
+ unlink_status_notifier(node);
+ }
+}
+
+static void setup_new_node(struct task_struct *p,
+ struct signotifier *node,
+ struct task_state_notify_info *args)
+{
+ node->notify_tsk = current;
+ node->sig = args->sig;
+ node->events = args->events;
+
+ /* Add this node to the list of notification requests
+ * for the specified process.
+ */
+ list_add_tail(&node->notify_list, &p->notify);
+
+ /* Also add this node to the list of monitor requests
+ * for the current process.
+ */
+ list_add_tail(&node->monitor_list, &current->monitor);
+}
+
+/* Returns 0 if arguments are valid, 1 if they are not. */
+static int invalid_args(struct task_state_notify_info *args)
+{
+ int ret = 1;
+
+ if (args->pid <= 0)
+ goto out;
+
+ /* Sig of -1 implies query, sig of 0 implies deregistration.
+ * Otherwise sig must be positive and within range.
+ */
+ if ((args->sig < -1) || (args->sig > _NSIG))
+ goto out;
+
+ /* If positive sig, must have valid events. */
+ if (args->sig > 0) {
+ if (!args->events || (args->events >= (1 << (NSIGCHLD+1))))
+ goto out;
+ }
+
+ ret = 0;
+out:
+ return ret;
+}
+
+/* Notify those registered for process state updates via do_notify_task_state().
+ * If "del" is nonzero, the process is dying and we want to free
+ * the nodes in the list as we go.
+ *
+ * Note: we only notify processes for events in which they have registered
+ * interest.
+ *
+ * Must be called holding a lock on tasklist_lock.
+ */
+void do_notify_others(struct task_struct *tsk, struct kernel_siginfo *info)
+{
+ struct signotifier *node;
+ unsigned int events;
+
+ /* This method of generating the event bit must be
+ * matched in the userspace library.
+ */
+ events = 1 << (info->si_code & 0xFF);
+
+ list_for_each_entry(node, &tsk->notify, notify_list) {
+ if (events & node->events) {
+ info->si_signo = node->sig;
+ group_send_sig_info(node->sig, info, node->notify_tsk, PIDTYPE_TGID);
+ }
+ }
+}
+
+void release_notify_others(struct task_struct *p)
+{
+ struct signotifier *n, *t;
+
+ /* Need to clean up any outstanding requests where we
+ * wanted to be notified when others died.
+ */
+ list_for_each_entry_safe(n, t, &p->monitor, monitor_list) {
+ unlink_status_notifier(n);
+ }
+
+ /* Also need to clean up any outstanding requests where others
+ * wanted to be notified when we died.
+ */
+ list_for_each_entry_safe(n, t, &p->notify, notify_list) {
+ unlink_status_notifier(n);
+ }
+}
+
+/* If the config is defined, then processes can call this routine
+ * to request notification when the specified task's state changes.
+ * On the death (or other state change) of the specified process,
+ * we will send them the specified signal if the event is listed
+ * in their event bitfield.
+ *
+ * A sig of 0 means that we want to deregister.
+ *
+ * The sig/events fields are value/result. On success we update them
+ * to reflect what they were before the call.
+ *
+ * Returns error code on error, on success we return 0.
+ */
+int do_notify_task_state(unsigned long arg)
+{
+ int err;
+ struct task_struct *p;
+ struct signotifier *node, *tmp;
+ struct task_state_notify_info args, oldargs;
+
+ if (copy_from_user(&args, (struct task_state_notify_info __user *)arg,
+ sizeof(args)))
+ return -EFAULT;
+ oldargs.pid = args.pid;
+
+ /* Validate the arguments passed in. */
+ err = -EINVAL;
+ if (invalid_args(&args))
+ goto out;
+
+ /* We must hold a write lock on tasklist_lock to add the notification
+ * later on, and we need some lock on tasklist_lock for
+ * find_task_by_pid(), so may as well take the write lock now.
+ * Must use write_lock_irq().
+ */
+ write_lock_irq(&tasklist_lock);
+
+ err = -ESRCH;
+ p = find_task_by_vpid(args.pid);
+ if (!p)
+ goto unlock_out;
+
+ /* Now we know pid exists, unlikely to fail. */
+ err = 0;
+
+ /* Check if we're already monitoring the specified pid. If so, update
+ * the monitoring parameters and return the old ones.
+ */
+ list_for_each_entry(tmp, &p->notify, notify_list) {
+ if (tmp->notify_tsk == current) {
+ handle_already_monitoring(tmp, &args, &oldargs);
+ goto unlock_out;
+ }
+ }
+
+ /* If we get here, we're not currently monitoring the process. */
+ oldargs.sig = 0;
+ oldargs.events = 0;
+
+ /* If we wanted to set up a new monitor, do it now. If we didn't
+ * manage to allocate memory for the new node, then we return
+ * an appropriate error.
+ */
+ if (args.sig > 0) {
+ node = kmalloc(sizeof(*node), GFP_ATOMIC);
+ if (node)
+ setup_new_node(p, node, &args);
+ else
+ err = -ENOMEM;
+ }
+
+unlock_out:
+ write_unlock_irq(&tasklist_lock);
+
+ /* Copy the old values back to caller. */
+ if (copy_to_user((struct task_state_notify_info __user *)arg,
+ &oldargs, sizeof(oldargs)))
+ err = -EFAULT;
+
+out:
+ return err;
+}
+
diff --git a/kernel/death_notify.h b/kernel/death_notify.h
new file mode 100644
index 000000000000..14a0995b79af
--- /dev/null
+++ b/kernel/death_notify.h
@@ -0,0 +1,46 @@
+/*
+ * kernel/death_notify.h, Process death notification support
+ *
+ * Copyright (c) 2006-2014 Wind River Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+#ifndef _KERNEL_DEATH_NOTIFY_H
+#define _KERNEL_DEATH_NOTIFY_H
+
+#ifdef CONFIG_SIGEXIT
+
+struct signotifier {
+ struct task_struct *notify_tsk;
+ struct list_head notify_list;
+ struct list_head monitor_list;
+ int sig;
+ unsigned int events;
+};
+
+extern int do_notify_task_state(unsigned long arg);
+extern void do_notify_others(struct task_struct *tsk,
+ struct kernel_siginfo *info);
+extern void release_notify_others(struct task_struct *p);
+
+#else /* !CONFIG_SIGEXIT */
+
+static inline void do_notify_others(struct task_struct *tsk,
+ struct kernel_siginfo *info) {}
+static inline void release_notify_others(struct task_struct *p) {}
+
+#endif /* CONFIG_SIGEXIT */
+#endif
+
diff --git a/kernel/exit.c b/kernel/exit.c
index d13d67fc5f4e..0e7adf824a52 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -68,6 +68,9 @@
#include <linux/uaccess.h>
#include <asm/unistd.h>
#include <asm/mmu_context.h>
+#ifdef CONFIG_SIGEXIT
+#include "death_notify.h"
+#endif
static void __unhash_process(struct task_struct *p, bool group_dead)
{
@@ -194,6 +197,9 @@ void release_task(struct task_struct *p)
cgroup_release(p);
write_lock_irq(&tasklist_lock);
+#ifdef CONFIG_SIGEXIT
+ release_notify_others(p);
+#endif
ptrace_release_task(p);
thread_pid = get_pid(p->thread_pid);
__exit_signal(p);
diff --git a/kernel/fork.c b/kernel/fork.c
index 7c044d377926..5333090a7cc5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2069,6 +2069,10 @@ static __latent_entropy struct task_struct *copy_process(
p->sequential_io = 0;
p->sequential_io_avg = 0;
#endif
+#ifdef CONFIG_SIGEXIT
+ INIT_LIST_HEAD(&p->notify);
+ INIT_LIST_HEAD(&p->monitor);
+#endif
/* Perform scheduler related setup. Assign this task to a CPU. */
retval = sched_fork(clone_flags, p);
diff --git a/kernel/signal.c b/kernel/signal.c
index ef8f2a28d37c..5c41ee2ccf7e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -55,6 +55,9 @@
#include <asm/unistd.h>
#include <asm/siginfo.h>
#include <asm/cacheflush.h>
+#ifdef CONFIG_SIGEXIT
+#include "death_notify.h"
+#endif
/*
* SLAB caches for signal bits.
@@ -1999,6 +2002,10 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
__wake_up_parent(tsk, tsk->parent);
spin_unlock_irqrestore(&psig->siglock, flags);
+#ifdef CONFIG_SIGEXIT
+ do_notify_others(tsk, &info);
+#endif
+
return autoreap;
}
@@ -2071,6 +2078,10 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
*/
__wake_up_parent(tsk, parent);
spin_unlock_irqrestore(&sighand->siglock, flags);
+
+#ifdef CONFIG_SIGEXIT
+ do_notify_others(tsk, &info);
+#endif
}
static inline bool may_ptrace_stop(void)
diff --git a/kernel/sys.c b/kernel/sys.c
index a730c03ee607..0f8decf763f1 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -73,6 +73,9 @@
#include <asm/unistd.h>
#include "uid16.h"
+#ifdef CONFIG_SIGEXIT
+#include "death_notify.h"
+#endif
#ifndef SET_UNALIGN_CTL
# define SET_UNALIGN_CTL(a, b) (-EINVAL)
@@ -2423,6 +2426,11 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
else
error = PR_MCE_KILL_DEFAULT;
break;
+#ifdef CONFIG_SIGEXIT
+ case PR_DO_NOTIFY_TASK_STATE:
+ error = do_notify_task_state(arg2);
+ break;
+#endif
case PR_SET_MM:
error = prctl_set_mm(arg2, arg3, arg4, arg5);
break;
--
2.29.2

View File

@@ -0,0 +1,33 @@
From 8d508571597d4e2d8c64621015c375ce0a346748 Mon Sep 17 00:00:00 2001
From: Dahir Osman <dahir.osman@windriver.com>
Date: Wed, 13 Jan 2016 10:01:11 -0500
Subject: [PATCH 02/10] PCI: Add ACS quirk for Intel Fortville NICs
Use quirks to determine isolation for now until a later kernel can
properly read the Fortville ACS capabilities.
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Zhang Zhiguo <zhangzhg@neusoft.com>
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
---
drivers/pci/quirks.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index b570f297e3ec..910026923549 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4740,6 +4740,10 @@ static const struct pci_dev_acs_enabled {
{ PCI_VENDOR_ID_INTEL, 0x15b7, pci_quirk_mf_endpoint_acs },
{ PCI_VENDOR_ID_INTEL, 0x15b8, pci_quirk_mf_endpoint_acs },
{ PCI_VENDOR_ID_INTEL, PCI_ANY_ID, pci_quirk_rciep_acs },
+ /* I40 */
+ { PCI_VENDOR_ID_INTEL, 0x1572, pci_quirk_mf_endpoint_acs },
+ { PCI_VENDOR_ID_INTEL, 0x1586, pci_quirk_mf_endpoint_acs },
+ { PCI_VENDOR_ID_INTEL, 0x1583, pci_quirk_mf_endpoint_acs },
/* QCOM QDF2xxx root ports */
{ PCI_VENDOR_ID_QCOM, 0x0400, pci_quirk_qcom_rp_acs },
{ PCI_VENDOR_ID_QCOM, 0x0401, pci_quirk_qcom_rp_acs },
--
2.29.2

View File

@@ -0,0 +1,170 @@
From a04995a064c17b5e5df6d6f184c1cf01bb8bc6a0 Mon Sep 17 00:00:00 2001
From: Chris Friesen <chris.friesen@windriver.com>
Date: Tue, 24 Nov 2015 16:27:28 -0500
Subject: [PATCH] affine compute kernel threads
This is a kernel enhancement to configure the cpu affinity of kernel
threads via kernel boot option kthread_cpus=<cpulist>. The compute
kickstart file and compute-huge.sh scripts will update grub with the
new option.
With kthread_cpus specified, the cpumask is immediately applied upon
thread launch. This does not affect kernel threads that specify cpu
and node.
Note: this is based off of Christoph Lameter's patch at
https://lwn.net/Articles/565932/ with the only difference being
the kernel parameter changed from kthread to kthread_cpus.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
[VT: The existing "isolcpus"
kernel bootarg, cgroup/cpuset, and taskset might provide the some
way to have cpu isolation. However none of them satisfies the requirements.
Replacing spaces with tabs. Combine two calls of set_cpus_allowed_ptr()
in kernel_init_freeable() in init/main.c into one. Performed tests]
Signed-off-by: Vu Tran <vu.tran@windriver.com>
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Zhang Zhiguo <zhangzhg@neusoft.com>
Signed-off-by: Vefa Bicakci <vefa.bicakci@windriver.com>
[jm: Adapted the patch for context changes.]
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
---
.../admin-guide/kernel-parameters.txt | 10 ++++++++++
include/linux/cpumask.h | 3 +++
init/main.c | 2 ++
kernel/cpu.c | 19 +++++++++++++++++++
kernel/kthread.c | 5 ++---
kernel/umh.c | 3 +++
6 files changed, 39 insertions(+), 3 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 26bfe7ae711b..91bb37814527 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2217,6 +2217,16 @@
See also Documentation/trace/kprobetrace.rst "Kernel
Boot Parameter" section.
+ kthread_cpus= [KNL, SMP] Only run kernel threads on the specified
+ list of processors. The kernel will start threads
+ on the indicated processors only (unless there
+ are specific reasons to run a thread with
+ different affinities). This can be used to make
+ init start on certain processors and also to
+ control where kmod and other user space threads
+ are being spawned. Allows to keep kernel threads
+ away from certain cores unless absoluteluy necessary.
+
kpti= [ARM64] Control page table isolation of user
and kernel address spaces.
Default: enabled on cores which need mitigation.
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index f0d895d6ac39..45f338cfdd6f 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -55,6 +55,7 @@ extern unsigned int nr_cpu_ids;
* cpu_present_mask - has bit 'cpu' set iff cpu is populated
* cpu_online_mask - has bit 'cpu' set iff cpu available to scheduler
* cpu_active_mask - has bit 'cpu' set iff cpu available to migration
+ * cpu_kthread_mask - has bit 'cpu' set iff general kernel threads allowed
*
* If !CONFIG_HOTPLUG_CPU, present == possible, and active == online.
*
@@ -91,10 +92,12 @@ extern struct cpumask __cpu_possible_mask;
extern struct cpumask __cpu_online_mask;
extern struct cpumask __cpu_present_mask;
extern struct cpumask __cpu_active_mask;
+extern struct cpumask __cpu_kthread_mask;
#define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
#define cpu_online_mask ((const struct cpumask *)&__cpu_online_mask)
#define cpu_present_mask ((const struct cpumask *)&__cpu_present_mask)
#define cpu_active_mask ((const struct cpumask *)&__cpu_active_mask)
+#define cpu_kthread_mask ((const struct cpumask *)&__cpu_kthread_mask)
extern atomic_t __num_online_cpus;
diff --git a/init/main.c b/init/main.c
index d9d914111251..951410241ef4 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1527,6 +1527,8 @@ static noinline void __init kernel_init_freeable(void)
do_basic_setup();
+ set_cpus_allowed_ptr(current, cpu_kthread_mask);
+
kunit_run_all_tests();
console_on_rootfs();
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2b8d7a5db383..b5dbec330189 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2449,6 +2449,25 @@ EXPORT_SYMBOL(__cpu_active_mask);
atomic_t __num_online_cpus __read_mostly;
EXPORT_SYMBOL(__num_online_cpus);
+struct cpumask __cpu_kthread_mask __read_mostly
+ = {CPU_BITS_ALL};
+EXPORT_SYMBOL(__cpu_kthread_mask);
+
+static int __init kthread_setup(char *str)
+{
+ struct cpumask tmp_mask;
+ int err;
+
+ err = cpulist_parse(str, &tmp_mask);
+ if (!err)
+ cpumask_copy(&__cpu_kthread_mask, &tmp_mask);
+ else
+ pr_err("Cannot parse 'kthread_cpus=%s'; error %d\n", str, err);
+
+ return 1;
+}
+__setup("kthread_cpus=", kthread_setup);
+
void init_cpu_present(const struct cpumask *src)
{
cpumask_copy(&__cpu_present_mask, src);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 5edf7e19ab26..7c3924752999 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -384,8 +384,7 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
* The kernel thread should not inherit these properties.
*/
sched_setscheduler_nocheck(task, SCHED_NORMAL, &param);
- set_cpus_allowed_ptr(task,
- housekeeping_cpumask(HK_FLAG_KTHREAD));
+ set_cpus_allowed_ptr(task, cpu_kthread_mask);
}
kfree(create);
return task;
@@ -634,7 +633,7 @@ int kthreadd(void *unused)
/* Setup a clean context for our children to inherit. */
set_task_comm(tsk, "kthreadd");
ignore_signals(tsk);
- set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_FLAG_KTHREAD));
+ set_cpus_allowed_ptr(tsk, cpu_kthread_mask);
set_mems_allowed(node_states[N_MEMORY]);
current->flags |= PF_NOFREEZE;
diff --git a/kernel/umh.c b/kernel/umh.c
index 3f646613a9d3..e5027cee43f7 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -80,6 +80,9 @@ static int call_usermodehelper_exec_async(void *data)
*/
current->fs->umask = 0022;
+ /* We can run only where init is allowed to run. */
+ set_cpus_allowed_ptr(current, cpu_kthread_mask);
+
/*
* Our parent (unbound workqueue) runs with elevated scheduling
* priority. Avoid propagating that into the userspace child.
--
2.31.1

View File

@@ -0,0 +1,72 @@
From 286f3a8af7f620dac84f20de5f7ad6542f447ace Mon Sep 17 00:00:00 2001
From: Chris Friesen <chris.friesen@windriver.com>
Date: Tue, 24 Nov 2015 16:27:29 -0500
Subject: [PATCH 04/10] Affine irqs and workqueues with kthread_cpus
If the kthread_cpus boot arg is set it means we want to affine
kernel threads to the specified CPU mask as much as possible
in order to avoid doing work on other CPUs.
In this commit we extend the meaning of that boot arg to also
apply to the CPU affinity of unbound and ordered workqueues.
We also use the kthread_cpus value to determine the default irq
affinity. Specifically, as long as the previously-calculated
irq affinity intersects with the kthread_cpus affinity then we'll
use the intersection of the two as the default irq affinity.
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
[VT: replacing spaces with tabs. Performed tests]
Signed-off-by: Vu Tran <vu.tran@windriver.com>
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Zhang Zhiguo <zhangzhg@neusoft.com>
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
---
kernel/irq/manage.c | 7 +++++++
kernel/workqueue.c | 4 ++++
2 files changed, 11 insertions(+)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 79dc02b956dc..420b5ce0bf89 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -515,6 +515,13 @@ int irq_setup_affinity(struct irq_desc *desc)
if (cpumask_intersects(&mask, nodemask))
cpumask_and(&mask, &mask, nodemask);
}
+
+ /* This will narrow down the affinity further if we've specified
+ * a reduced cpu_kthread_mask in the boot args.
+ */
+ if (cpumask_intersects(&mask, cpu_kthread_mask))
+ cpumask_and(&mask, &mask, cpu_kthread_mask);
+
ret = irq_do_set_affinity(&desc->irq_data, &mask, false);
raw_spin_unlock(&mask_lock);
return ret;
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1e2ca744dadb..b854874d0518 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5956,6 +5956,8 @@ void __init workqueue_init_early(void)
BUG_ON(!(attrs = alloc_workqueue_attrs()));
attrs->nice = std_nice[i];
+ /* If we've specified a kthread mask apply it here too. */
+ cpumask_copy(attrs->cpumask, cpu_kthread_mask);
unbound_std_wq_attrs[i] = attrs;
/*
@@ -5966,6 +5968,8 @@ void __init workqueue_init_early(void)
BUG_ON(!(attrs = alloc_workqueue_attrs()));
attrs->nice = std_nice[i];
attrs->no_numa = true;
+ /* If we've specified a kthread mask apply it here too. */
+ cpumask_copy(attrs->cpumask, cpu_kthread_mask);
ordered_wq_attrs[i] = attrs;
}
--
2.29.2

View File

@@ -0,0 +1,36 @@
From 82e56b6b7c05eebc589b37e96ed3b0a44d6cdef7 Mon Sep 17 00:00:00 2001
From: Chris Friesen <chris.friesen@windriver.com>
Date: Thu, 12 May 2016 18:00:00 -0400
Subject: [PATCH 05/10] Make kernel start eth devices at offset
In order to avoid naming collisions, we want to make the kernel
start naming its "ethX" devices at eth1000 instead of eth0. This
will let us rename to a range starting at eth0.
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Zhang Zhiguo <zhangzhg@neusoft.com>
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
---
net/core/dev.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index 62ff7121b22d..e63fe7662c73 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1218,6 +1218,12 @@ static int __dev_alloc_name(struct net *net, const char *name, char *buf)
set_bit(i, inuse);
}
+ /* STX extension, want kernel to start at eth1000 */
+ if (strcmp(name, "eth%d") == 0) {
+ for (i=0; i < 1000; i++)
+ set_bit(i, inuse);
+ }
+
i = find_first_zero_bit(inuse, max_netdevices);
free_page((unsigned long) inuse);
}
--
2.29.2

View File

@@ -0,0 +1,125 @@
From c35ee3034b61e72bf9fef888a3c4702049a77bcc Mon Sep 17 00:00:00 2001
From: Matt Peters <matt.peters@windriver.com>
Date: Mon, 30 May 2016 10:51:02 -0400
Subject: [PATCH] intel-iommu: allow ignoring Ethernet device RMRR with IOMMU
passthrough
Some BIOS's are reporting DMAR RMRR entries for Ethernet devices
which is causing problems when PCI passthrough is enabled. These
devices should be able to use the static identity map since the
host should not be enforcing specific address ranges when IOMMU
passthrough is enabled.
Originally-by: Matt Peters <matt.peters@windriver.com>
[PG: Added bootarg wrapper and documentation entries.]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Nam Ninh <nam.ninh@windriver.com>
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Zhang Zhiguo <zhangzhg@neusoft.com>
Signed-off-by: Dongqi Chen <chen.dq@neusoft.com>
[lz: Adapted the patch for context changes.]
Signed-off-by: Li Zhou <li.zhou@windriver.com>
[jp: fix warning: this 'else' clause does not guard]
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
---
.../admin-guide/kernel-parameters.txt | 5 +++++
Documentation/x86/intel-iommu.rst | 18 +++++++++++++++
drivers/iommu/intel/iommu.c | 22 ++++++++++++++++++-
3 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 552f1db5b9d7..e862aabe6255 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1861,6 +1861,11 @@
than 32-bit addressing. The default is to look
for translation below 32-bit and if not available
then look in the higher range.
+ eth_no_rmrr [Default Off]
+ With this option provided, the kernel will ignore
+ any specified RMRR regions specified by the BIOS
+ for PCI ethernet devices. Confirm with your hardware
+ vendor the RMRR regions are indeed invalid first.
strict [Default Off]
With this option on every unmap_single operation will
result in a hardware IOTLB flush operation as opposed
diff --git a/Documentation/x86/intel-iommu.rst b/Documentation/x86/intel-iommu.rst
index 099f13d51d5f..18e6a8d8b1ee 100644
--- a/Documentation/x86/intel-iommu.rst
+++ b/Documentation/x86/intel-iommu.rst
@@ -33,6 +33,24 @@ regions will fail. Hence BIOS uses RMRR to specify these regions along with
devices that need to access these regions. OS is expected to setup
unity mappings for these regions for these devices to access these regions.
+RMRR for other devices?
+-----------------------
+
+There are reports of BIOS out there that indicate RMRR regions for things
+like ethernet devices. As per mainline commit c875d2c1b8083 ("iommu/vt-d:
+Exclude devices using RMRRs from IOMMU API domains") such a device is
+"fundamentally incompatible" with the IOMMU API and "we must prevent such
+devices from being used by the IOMMU API." However, in the event that
+the RMRR indicated by the BIOS is assumed to be just a reporting error,
+there is an additional iommu boot arg that can be used to ignore RMRR
+settings for ethernet, i.e. "intel_iommu=on,eth_no_rmrr iommu=pt".
+Note that iommu=pt is required in order to eth_no_rmrr to have effect.
+
+If you use this setting, you should consult with your hardware vendor to
+confirm that it is just a reporting error, and that it truly is not
+actively using any DMA to/from RMRR, as otherwise system instability
+may result.
+
How is IOVA generated?
----------------------
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7e3db4c0324d..16ebba2723cb 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -354,6 +354,7 @@ static int dmar_map_gfx = 1;
static int dmar_forcedac;
static int intel_iommu_strict;
static int intel_iommu_superpage = 1;
+static int intel_iommu_ethrmrr = 1;
static int iommu_identity_mapping;
static int intel_no_bounce;
static int iommu_skip_te_disable;
@@ -448,6 +449,15 @@ static int __init intel_iommu_setup(char *str)
} else if (!strncmp(str, "forcedac", 8)) {
pr_info("Forcing DAC for PCI devices\n");
dmar_forcedac = 1;
+ } else if (!strncmp(str, "eth_no_rmrr", 11)) {
+ if (!iommu_default_passthrough()) {
+ printk(KERN_WARNING
+ "Intel-IOMMU: error - eth_no_rmrr requires iommu=pt\n");
+ } else {
+ printk(KERN_INFO
+ "Intel-IOMMU: ignoring ethernet RMRR values\n");
+ intel_iommu_ethrmrr = 0;
+ }
} else if (!strncmp(str, "strict", 6)) {
pr_info("Disable batched IOTLB flush\n");
intel_iommu_strict = 1;
@@ -2880,8 +2890,18 @@ static bool device_rmrr_is_relaxable(struct device *dev)
pdev = to_pci_dev(dev);
if (IS_USB_DEVICE(pdev) || IS_GFX_DEVICE(pdev))
return true;
- else
+ else {
+ /* As a temporary workaround for issues seen on ProLiant DL380p,
+ * allow the operator to ignore the RMRR settings for ethernet
+ * devices. Ideally the end user should contact their vendor
+ * regarding why there are RMRR, as per mainline c875d2c1b8083
+ * ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains")
+ * it seems that these make no sense at all.
+ */
+ if ((pdev->class >> 8) == PCI_CLASS_NETWORK_ETHERNET && !intel_iommu_ethrmrr)
+ return true;
return false;
+ }
}
/*
--
2.29.2

View File

@@ -0,0 +1,27 @@
From 87e7e3bd136b05d9ae105fe87879d1fe6730ee18 Mon Sep 17 00:00:00 2001
From: Jim Somerville <Jim.Somerville@windriver.com>
Date: Tue, 6 Mar 2018 12:54:40 -0500
Subject: [PATCH 06/10] turn off write same in smartqpi driver
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Zhang Zhiguo <zhangzhg@neusoft.com>
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
---
drivers/scsi/smartpqi/smartpqi_init.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index 9d0229656681..af46f3a024da 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -6571,6 +6571,7 @@ static struct scsi_host_template pqi_driver_template = {
.map_queues = pqi_map_queues,
.sdev_attrs = pqi_sdev_attrs,
.shost_attrs = pqi_shost_attrs,
+ .no_write_same = 1,
};
static int pqi_register_scsi(struct pqi_ctrl_info *ctrl_info)
--
2.29.2

View File

@@ -0,0 +1,79 @@
From 4d58a9a618827e4410e3828b4deef5832a8576db Mon Sep 17 00:00:00 2001
From: Jim Somerville <Jim.Somerville@windriver.com>
Date: Wed, 29 Jan 2020 14:19:22 -0500
Subject: [PATCH 07/10] Allow dmar quirks for broken bioses
Problem:
Broken bios creates inaccurate DMAR tables,
reporting some bridges as having endpoint types.
This causes IOMMU initialization to bail
out early with an error code, the result of
which is vfio not working correctly.
This is seen on some Skylake based Wolfpass
server platforms with up-to-date bios installed.
Solution:
Instead of just bailing out of IOMMU
initialization when such a condition is found,
we report it and continue. The IOMMU ends
up successfully initialized anyway. We do this
only on platforms that have the Skylake bridges
where this issue has been seen.
This change is inspired by a similar one posted by
Lu Baolu of Intel Corp to lkml
https://lkml.org/lkml/2019/12/24/15
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
---
drivers/iommu/intel/dmar.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 02e7c10a4224..7d423ac36a44 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -66,6 +66,26 @@ static void free_iommu(struct intel_iommu *iommu);
extern const struct iommu_ops intel_iommu_ops;
+static int scope_mismatch_quirk;
+static void quirk_dmar_scope_mismatch(struct pci_dev *dev)
+{
+ pci_info(dev, "scope mismatch ignored\n");
+ scope_mismatch_quirk = 1;
+}
+
+/*
+ * We expect devices with endpoint scope to have normal PCI
+ * headers, and devices with bridge scope to have bridge PCI
+ * headers. However some PCI devices may be listed in the
+ * DMAR table with bridge scope, even though they have a
+ * normal PCI header and vice versa. We don't declare a
+ * scope mismatch for the special cases below, even though
+ * the bios creates broken tables.
+ */
+/* Sky Lake-E PCI Express Root Port A */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2030,
+ quirk_dmar_scope_mismatch);
+
static void dmar_register_drhd_unit(struct dmar_drhd_unit *drhd)
{
/*
@@ -255,7 +275,10 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info *info,
info->dev->class >> 16 != PCI_BASE_CLASS_BRIDGE))) {
pr_warn("Device scope type does not match for %s\n",
pci_name(info->dev));
- return -EINVAL;
+ if (!scope_mismatch_quirk)
+ return -EINVAL;
+ else
+ pr_warn("but continuing anyway\n");
}
for_each_dev_scope(devices, devices_cnt, i, tmp)
--
2.29.2

View File

@@ -0,0 +1,100 @@
From 451738188482a34603a184e868893f407998bdd4 Mon Sep 17 00:00:00 2001
From: Nayna Jain <nayna@linux.vnet.ibm.com>
Date: Fri, 10 Nov 2017 17:16:35 -0500
Subject: [PATCH 08/10] tpm: ignore burstcount to improve tpm_tis send()
performance
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The TPM burstcount status indicates the number of bytes that can
be sent to the TPM without causing bus wait states.  Effectively,
it is the number of empty bytes in the command FIFO.
This patch optimizes the tpm_tis_send_data() function by checking
the burstcount only once. And if the burstcount is valid, it writes
all the bytes at once, permitting wait state.
After this change, performance on a TPM 1.2 with an 8 byte
burstcount for 1000 extends improved from ~41sec to ~14sec.
Suggested-by: Ken Goldman <kgold@linux.vnet.ibm.com> in
conjunction with the TPM Device Driver work group.
Signed-off-by: Nayna Jain <nayna@linux.vnet.ibm.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Dongqi Chen <chen.dq@neusoft.com>
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
---
drivers/char/tpm/tpm_tis_core.c | 43 ++++++++++++---------------------
1 file changed, 15 insertions(+), 28 deletions(-)
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index a2e0395cbe61..2c69fde1e4e5 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -330,7 +330,6 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
{
struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
int rc, status, burstcnt;
- size_t count = 0;
bool itpm = priv->flags & TPM_TIS_ITPM_WORKAROUND;
status = tpm_tis_status(chip);
@@ -343,36 +342,24 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
goto out_err;
}
}
-
- while (count < len - 1) {
- burstcnt = get_burstcount(chip);
- if (burstcnt < 0) {
- dev_err(&chip->dev, "Unable to read burstcount\n");
- rc = burstcnt;
- goto out_err;
- }
- burstcnt = min_t(int, burstcnt, len - count - 1);
- rc = tpm_tis_write_bytes(priv, TPM_DATA_FIFO(priv->locality),
- burstcnt, buf + count);
- if (rc < 0)
- goto out_err;
-
- count += burstcnt;
-
- if (wait_for_tpm_stat(chip, TPM_STS_VALID, chip->timeout_c,
- &priv->int_queue, false) < 0) {
- rc = -ETIME;
- goto out_err;
- }
- status = tpm_tis_status(chip);
- if (!itpm && (status & TPM_STS_DATA_EXPECT) == 0) {
- rc = -EIO;
- goto out_err;
- }
+ /*
+ * Get the initial burstcount to ensure TPM is ready to
+ * accept data, even when waiting for burstcount is disabled.
+ */
+ burstcnt = get_burstcount(chip);
+ if (burstcnt < 0) {
+ dev_err(&chip->dev, "Unable to read burstcount\n");
+ rc = burstcnt;
+ goto out_err;
}
+ rc = tpm_tis_write_bytes(priv, TPM_DATA_FIFO(priv->locality),
+ len -1, buf);
+ if (rc < 0)
+ goto out_err;
+
/* write last byte */
- rc = tpm_tis_write8(priv, TPM_DATA_FIFO(priv->locality), buf[count]);
+ rc = tpm_tis_write8(priv, TPM_DATA_FIFO(priv->locality), buf[len-1]);
if (rc < 0)
goto out_err;
--
2.29.2

View File

@@ -0,0 +1,395 @@
From b377b8aed56d93aef67f7dbad01d026bacffedb1 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Tue, 14 Sep 2021 01:07:57 +0200
Subject: [PATCH] bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode
Fix cgroup v1 interference when non-root cgroup v2 BPF programs are used.
Back in the days, commit bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
embedded per-socket cgroup information into sock->sk_cgrp_data and in order
to save 8 bytes in struct sock made both mutually exclusive, that is, when
cgroup v1 socket tagging (e.g. net_cls/net_prio) is used, then cgroup v2
falls back to the root cgroup in sock_cgroup_ptr() (&cgrp_dfl_root.cgrp).
The assumption made was "there is no reason to mix the two and this is in line
with how legacy and v2 compatibility is handled" as stated in bd1060a1d671.
However, with Kubernetes more widely supporting cgroups v2 as well nowadays,
this assumption no longer holds, and the possibility of the v1/v2 mixed mode
with the v2 root fallback being hit becomes a real security issue.
Many of the cgroup v2 BPF programs are also used for policy enforcement, just
to pick _one_ example, that is, to programmatically deny socket related system
calls like connect(2) or bind(2). A v2 root fallback would implicitly cause
a policy bypass for the affected Pods.
In production environments, we have recently seen this case due to various
circumstances: i) a different 3rd party agent and/or ii) a container runtime
such as [0] in the user's environment configuring legacy cgroup v1 net_cls
tags, which triggered implicitly mentioned root fallback. Another case is
Kubernetes projects like kind [1] which create Kubernetes nodes in a container
and also add cgroup namespaces to the mix, meaning programs which are attached
to the cgroup v2 root of the cgroup namespace get attached to a non-root
cgroup v2 path from init namespace point of view. And the latter's root is
out of reach for agents on a kind Kubernetes node to configure. Meaning, any
entity on the node setting cgroup v1 net_cls tag will trigger the bypass
despite cgroup v2 BPF programs attached to the namespace root.
Generally, this mutual exclusiveness does not hold anymore in today's user
environments and makes cgroup v2 usage from BPF side fragile and unreliable.
This fix adds proper struct cgroup pointer for the cgroup v2 case to struct
sock_cgroup_data in order to address these issues; this implicitly also fixes
the tradeoffs being made back then with regards to races and refcount leaks
as stated in bd1060a1d671, and removes the fallback, so that cgroup v2 BPF
programs always operate as expected.
[0] https://github.com/nestybox/sysbox/
[1] https://kind.sigs.k8s.io/
Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-1-daniel@iogearbox.net
(cherry picked from commit 8520e224f547cd070c7c8f97b1fc6d58cff7ccaa)
Signed-off-by: M. Vefa Bicakci <vefa.bicakci@windriver.com>
---
include/linux/cgroup-defs.h | 107 +++++++++--------------------------
include/linux/cgroup.h | 22 +------
kernel/cgroup/cgroup.c | 50 ++++------------
net/core/netclassid_cgroup.c | 7 +--
net/core/netprio_cgroup.c | 10 +---
5 files changed, 41 insertions(+), 155 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index fee0b5547cd0..3e406626168a 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -763,107 +763,54 @@ static inline void cgroup_threadgroup_change_end(struct task_struct *tsk) {}
* sock_cgroup_data is embedded at sock->sk_cgrp_data and contains
* per-socket cgroup information except for memcg association.
*
- * On legacy hierarchies, net_prio and net_cls controllers directly set
- * attributes on each sock which can then be tested by the network layer.
- * On the default hierarchy, each sock is associated with the cgroup it was
- * created in and the networking layer can match the cgroup directly.
- *
- * To avoid carrying all three cgroup related fields separately in sock,
- * sock_cgroup_data overloads (prioidx, classid) and the cgroup pointer.
- * On boot, sock_cgroup_data records the cgroup that the sock was created
- * in so that cgroup2 matches can be made; however, once either net_prio or
- * net_cls starts being used, the area is overriden to carry prioidx and/or
- * classid. The two modes are distinguished by whether the lowest bit is
- * set. Clear bit indicates cgroup pointer while set bit prioidx and
- * classid.
- *
- * While userland may start using net_prio or net_cls at any time, once
- * either is used, cgroup2 matching no longer works. There is no reason to
- * mix the two and this is in line with how legacy and v2 compatibility is
- * handled. On mode switch, cgroup references which are already being
- * pointed to by socks may be leaked. While this can be remedied by adding
- * synchronization around sock_cgroup_data, given that the number of leaked
- * cgroups is bound and highly unlikely to be high, this seems to be the
- * better trade-off.
+ * On legacy hierarchies, net_prio and net_cls controllers directly
+ * set attributes on each sock which can then be tested by the network
+ * layer. On the default hierarchy, each sock is associated with the
+ * cgroup it was created in and the networking layer can match the
+ * cgroup directly.
*/
struct sock_cgroup_data {
- union {
-#ifdef __LITTLE_ENDIAN
- struct {
- u8 is_data : 1;
- u8 no_refcnt : 1;
- u8 unused : 6;
- u8 padding;
- u16 prioidx;
- u32 classid;
- } __packed;
-#else
- struct {
- u32 classid;
- u16 prioidx;
- u8 padding;
- u8 unused : 6;
- u8 no_refcnt : 1;
- u8 is_data : 1;
- } __packed;
+ struct cgroup *cgroup; /* v2 */
+#ifdef CONFIG_CGROUP_NET_CLASSID
+ u32 classid; /* v1 */
+#endif
+#ifdef CONFIG_CGROUP_NET_PRIO
+ u16 prioidx; /* v1 */
#endif
- u64 val;
- };
};
-/*
- * There's a theoretical window where the following accessors race with
- * updaters and return part of the previous pointer as the prioidx or
- * classid. Such races are short-lived and the result isn't critical.
- */
static inline u16 sock_cgroup_prioidx(const struct sock_cgroup_data *skcd)
{
- /* fallback to 1 which is always the ID of the root cgroup */
- return (skcd->is_data & 1) ? skcd->prioidx : 1;
+#ifdef CONFIG_CGROUP_NET_PRIO
+ return READ_ONCE(skcd->prioidx);
+#else
+ return 1;
+#endif
}
static inline u32 sock_cgroup_classid(const struct sock_cgroup_data *skcd)
{
- /* fallback to 0 which is the unconfigured default classid */
- return (skcd->is_data & 1) ? skcd->classid : 0;
+#ifdef CONFIG_CGROUP_NET_CLASSID
+ return READ_ONCE(skcd->classid);
+#else
+ return 0;
+#endif
}
-/*
- * If invoked concurrently, the updaters may clobber each other. The
- * caller is responsible for synchronization.
- */
static inline void sock_cgroup_set_prioidx(struct sock_cgroup_data *skcd,
u16 prioidx)
{
- struct sock_cgroup_data skcd_buf = {{ .val = READ_ONCE(skcd->val) }};
-
- if (sock_cgroup_prioidx(&skcd_buf) == prioidx)
- return;
-
- if (!(skcd_buf.is_data & 1)) {
- skcd_buf.val = 0;
- skcd_buf.is_data = 1;
- }
-
- skcd_buf.prioidx = prioidx;
- WRITE_ONCE(skcd->val, skcd_buf.val); /* see sock_cgroup_ptr() */
+#ifdef CONFIG_CGROUP_NET_PRIO
+ WRITE_ONCE(skcd->prioidx, prioidx);
+#endif
}
static inline void sock_cgroup_set_classid(struct sock_cgroup_data *skcd,
u32 classid)
{
- struct sock_cgroup_data skcd_buf = {{ .val = READ_ONCE(skcd->val) }};
-
- if (sock_cgroup_classid(&skcd_buf) == classid)
- return;
-
- if (!(skcd_buf.is_data & 1)) {
- skcd_buf.val = 0;
- skcd_buf.is_data = 1;
- }
-
- skcd_buf.classid = classid;
- WRITE_ONCE(skcd->val, skcd_buf.val); /* see sock_cgroup_ptr() */
+#ifdef CONFIG_CGROUP_NET_CLASSID
+ WRITE_ONCE(skcd->classid, classid);
+#endif
}
#else /* CONFIG_SOCK_CGROUP_DATA */
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 618838c48313..9c88b7da3245 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -816,33 +816,13 @@ static inline void cgroup_account_cputime_field(struct task_struct *task,
*/
#ifdef CONFIG_SOCK_CGROUP_DATA
-#if defined(CONFIG_CGROUP_NET_PRIO) || defined(CONFIG_CGROUP_NET_CLASSID)
-extern spinlock_t cgroup_sk_update_lock;
-#endif
-
-void cgroup_sk_alloc_disable(void);
void cgroup_sk_alloc(struct sock_cgroup_data *skcd);
void cgroup_sk_clone(struct sock_cgroup_data *skcd);
void cgroup_sk_free(struct sock_cgroup_data *skcd);
static inline struct cgroup *sock_cgroup_ptr(struct sock_cgroup_data *skcd)
{
-#if defined(CONFIG_CGROUP_NET_PRIO) || defined(CONFIG_CGROUP_NET_CLASSID)
- unsigned long v;
-
- /*
- * @skcd->val is 64bit but the following is safe on 32bit too as we
- * just need the lower ulong to be written and read atomically.
- */
- v = READ_ONCE(skcd->val);
-
- if (v & 3)
- return &cgrp_dfl_root.cgrp;
-
- return (struct cgroup *)(unsigned long)v ?: &cgrp_dfl_root.cgrp;
-#else
- return (struct cgroup *)(unsigned long)skcd->val;
-#endif
+ return skcd->cgroup;
}
#else /* CONFIG_CGROUP_DATA */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c8b811e039cc..e065307f2346 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6419,74 +6419,44 @@ int cgroup_parse_float(const char *input, unsigned dec_shift, s64 *v)
*/
#ifdef CONFIG_SOCK_CGROUP_DATA
-#if defined(CONFIG_CGROUP_NET_PRIO) || defined(CONFIG_CGROUP_NET_CLASSID)
-
-DEFINE_SPINLOCK(cgroup_sk_update_lock);
-static bool cgroup_sk_alloc_disabled __read_mostly;
-
-void cgroup_sk_alloc_disable(void)
-{
- if (cgroup_sk_alloc_disabled)
- return;
- pr_info("cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation\n");
- cgroup_sk_alloc_disabled = true;
-}
-
-#else
-
-#define cgroup_sk_alloc_disabled false
-
-#endif
-
void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
{
- if (cgroup_sk_alloc_disabled) {
- skcd->no_refcnt = 1;
- return;
- }
-
/* Don't associate the sock with unrelated interrupted task's cgroup. */
if (in_interrupt())
return;
rcu_read_lock();
-
while (true) {
struct css_set *cset;
cset = task_css_set(current);
if (likely(cgroup_tryget(cset->dfl_cgrp))) {
- skcd->val = (unsigned long)cset->dfl_cgrp;
+ skcd->cgroup = cset->dfl_cgrp;
cgroup_bpf_get(cset->dfl_cgrp);
break;
}
cpu_relax();
}
-
rcu_read_unlock();
}
void cgroup_sk_clone(struct sock_cgroup_data *skcd)
{
- if (skcd->val) {
- if (skcd->no_refcnt)
- return;
- /*
- * We might be cloning a socket which is left in an empty
- * cgroup and the cgroup might have already been rmdir'd.
- * Don't use cgroup_get_live().
- */
- cgroup_get(sock_cgroup_ptr(skcd));
- cgroup_bpf_get(sock_cgroup_ptr(skcd));
- }
+ struct cgroup *cgrp = sock_cgroup_ptr(skcd);
+
+ /*
+ * We might be cloning a socket which is left in an empty
+ * cgroup and the cgroup might have already been rmdir'd.
+ * Don't use cgroup_get_live().
+ */
+ cgroup_get(cgrp);
+ cgroup_bpf_get(cgrp);
}
void cgroup_sk_free(struct sock_cgroup_data *skcd)
{
struct cgroup *cgrp = sock_cgroup_ptr(skcd);
- if (skcd->no_refcnt)
- return;
cgroup_bpf_put(cgrp);
cgroup_put(cgrp);
}
diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c
index 41b24cd31562..b6de5ee22391 100644
--- a/net/core/netclassid_cgroup.c
+++ b/net/core/netclassid_cgroup.c
@@ -72,11 +72,8 @@ static int update_classid_sock(const void *v, struct file *file, unsigned n)
struct update_classid_context *ctx = (void *)v;
struct socket *sock = sock_from_file(file, &err);
- if (sock) {
- spin_lock(&cgroup_sk_update_lock);
+ if (sock)
sock_cgroup_set_classid(&sock->sk->sk_cgrp_data, ctx->classid);
- spin_unlock(&cgroup_sk_update_lock);
- }
if (--ctx->batch == 0) {
ctx->batch = UPDATE_CLASSID_BATCH;
return n + 1;
@@ -122,8 +119,6 @@ static int write_classid(struct cgroup_subsys_state *css, struct cftype *cft,
struct css_task_iter it;
struct task_struct *p;
- cgroup_sk_alloc_disable();
-
cs->classid = (u32)value;
css_task_iter_start(css, 0, &it);
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 9bd4cab7d510..d4c71e382a13 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -207,8 +207,6 @@ static ssize_t write_priomap(struct kernfs_open_file *of,
if (!dev)
return -ENODEV;
- cgroup_sk_alloc_disable();
-
rtnl_lock();
ret = netprio_set_prio(of_css(of), dev, prio);
@@ -222,12 +220,10 @@ static int update_netprio(const void *v, struct file *file, unsigned n)
{
int err;
struct socket *sock = sock_from_file(file, &err);
- if (sock) {
- spin_lock(&cgroup_sk_update_lock);
+
+ if (sock)
sock_cgroup_set_prioidx(&sock->sk->sk_cgrp_data,
(unsigned long)v);
- spin_unlock(&cgroup_sk_update_lock);
- }
return 0;
}
@@ -236,8 +232,6 @@ static void net_prio_attach(struct cgroup_taskset *tset)
struct task_struct *p;
struct cgroup_subsys_state *css;
- cgroup_sk_alloc_disable();
-
cgroup_taskset_for_each(p, css, tset) {
void *v = (void *)(unsigned long)css->id;
--
2.29.2

View File

@@ -0,0 +1,41 @@
From 2a32d5bc7e385fbf40f22cc413354e17a24d4de9 Mon Sep 17 00:00:00 2001
From: Jiping Ma <jiping.ma2@windriver.com>
Date: Sun, 10 Oct 2021 18:56:26 -0700
Subject: [PATCH] scsi: smartpqi: Enable sas_address sysfs for SATA device
type.
We met the issue DM complains that it can't find the disk specified
in the deployment config file after we updated the Linux kernel to 5.10.
The error is "failed to find disk for path /dev/disk/by-path/
pci-0000:3b:00.0-sas-0x31402ec001d92983-lun-0"
This happens because device type SATA is excluded from being
processed with the function pqi_is_device_with_sas_address.
which causes all SATA type disk drives to appear the same, having
zeroes in the lun name. /dev/disk/by-path/
pci-0000:3b:00.0-sas-0x0000000000000000-lun-0
We can add type SA_DEVICE_TYPE_SATA to class device_with_sas_address,
since it will also get the sas_address from wwid. and works transparently
with the old kernel without gaps.
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
---
drivers/scsi/smartpqi/smartpqi_init.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index ecb2af3f43ca..df16e0a27a41 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -2101,6 +2101,7 @@ static inline void pqi_mask_device(u8 *scsi3addr)
static inline bool pqi_is_device_with_sas_address(struct pqi_scsi_dev *device)
{
switch (device->device_type) {
+ case SA_DEVICE_TYPE_SATA:
case SA_DEVICE_TYPE_SAS:
case SA_DEVICE_TYPE_EXPANDER_SMP:
case SA_DEVICE_TYPE_SES:
--
2.31.1

View File

@@ -0,0 +1,11 @@
0001-Notification-of-death-of-arbitrary-processes.patch
0002-PCI-Add-ACS-quirk-for-Intel-Fortville-NICs.patch
0003-affine-compute-kernel-threads.patch
0004-Affine-irqs-and-workqueues-with-kthread_cpus.patch
0005-Make-kernel-start-eth-devices-at-offset.patch
0006-intel-iommu-allow-ignoring-Ethernet-device-RMRR-with.patch
0007-turn-off-write-same-in-smartqpi-driver.patch
0008-Allow-dmar-quirks-for-broken-bioses.patch
0009-tpm-ignore-burstcount-to-improve-tpm_tis-send-perfor.patch
0010-bpf-cgroups-Fix-cgroup-v2-fallback-on-v1-v2-mixed-mo.patch
0011-scsi-smartpqi-Enable-sas_address-sysfs-for-SATA-dev.patch