Skip to content

[PW_SID:977217] [RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type#590

Closed
linux-riscv-bot wants to merge 2 commits into
workflow__riscv__fixesfrom
pw977217
Closed

[PW_SID:977217] [RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type#590
linux-riscv-bot wants to merge 2 commits into
workflow__riscv__fixesfrom
pw977217

Conversation

@linux-riscv-bot
Copy link
Copy Markdown

PR for series 977217 applied to workflow__riscv__fixes

Name: [RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type
URL: https://patchwork.kernel.org/project/linux-riscv/list/?series=977217
Version: 1

Linux RISC-V bot and others added 2 commits June 27, 2025 17:47
…n type

On an x86 machine, when cpu 0 is isolated with "isolcpus=", on initiating
suspend to memory, a warning is triggered, followed by a kernel crash. This is
on a defconfig + CONFIG_ENERGY_MODEL kernel:

$ cat /proc/version
Linux version 6.16.0-rc4 (shashank@machine) (gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #56 SMP PREEMPT_DYNAMIC Mon Jun 30 16:27:42 JST 2025
$ cat /proc/cmdline
kernel-6.16-rc4 console=tty0 initrd=ramdisk.cpio.lz4 console=ttyS4,115200n8 no_console_suspend ignore_loglevel isolcpus=0
$ echo mem > /sys/power/state
[  124.899083] PM: suspend entry (deep)
    <snip>
[  125.169816] smpboot: CPU 2 is now offline
[  125.174167] ------------[ cut here ]------------
[  125.178838] WARNING: CPU: 1 PID: 20 at kernel/sched/topology.c:2459 build_sched_domains+0x1246/0x1550
[  125.188117] Modules linked in:
[  125.191232] CPU: 1 UID: 0 PID: 20 Comm: cpuhp/1 Tainted: G S                  6.16.0-rc4 #56 PREEMPT(voluntary)
[  125.201453] Tainted: [S]=CPU_OUT_OF_SPEC
    <snip>
[  125.303753] Call Trace:
[  125.306248]  <TASK>
[  125.308394]  ? cpu_attach_domain+0x3f1/0x730
[  125.312710]  ? __kmalloc_cache_noprof+0x26a/0x300
[  125.317465]  partition_sched_domains+0x294/0x7f0
[  125.322136]  cpuset_reset_sched_domains+0x1e/0x30
[  125.326893]  sched_cpu_deactivate+0x11d/0x160
[  125.331298]  ? __pfx_sched_cpu_deactivate+0x10/0x10
[  125.336225]  cpuhp_invoke_callback+0x107/0x470
[  125.340714]  ? __pfx_smpboot_thread_fn+0x10/0x10
[  125.345385]  cpuhp_thread_fun+0xdb/0x160
[  125.349352]  smpboot_thread_fn+0xeb/0x220
[  125.353411]  kthread+0xf3/0x1f0
[  125.356600]  ? __pfx_kthread+0x10/0x10
[  125.360402]  ? __pfx_kthread+0x10/0x10
[  125.364204]  ret_from_fork+0x7d/0xd0
[  125.367832]  ? __pfx_kthread+0x10/0x10
[  125.371634]  ret_from_fork_asm+0x1a/0x30
[  125.375614]  </TASK>
[  125.377848] ---[ end trace 0000000000000000 ]---
[  125.382511] BUG: unable to handle page fault for address: 0000000087520483
[  125.389436] #PF: supervisor read access in kernel mode
[  125.394613] #PF: error_code(0x0000) - not-present page
[  125.399800] PGD 0 P4D 0
[  125.402374] Oops: Oops: 0000 [#1] SMP NOPTI
[  125.406601] CPU: 1 UID: 0 PID: 20 Comm: cpuhp/1 Tainted: G S      W           6.16.0-rc4 #56 PREEMPT(voluntary)
[  125.416819] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
    <snip>
[  125.430828] RIP: 0010:partition_sched_domains+0x36d/0x7f0
[  125.436265] Code: 02 00 48 8b 4d 00 41 bc 01 00 00 00 4c 89 c0 74 a0 b8 40 00 00 00 48 85 c9 74 05 f3 48 0f bc c1 48 98 48 8b 04 c5 e0 bb 85 86 <4e> 8b bc 30 c0 0a 00 00 8b 05 f5 8f 75 01 85 c0 0f 84 090
[  125.455082] RSP: 0018:ffffb185001dfd90 EFLAGS: 00010246
[  125.460352] RAX: 0000000100000003 RBX: ffff98ac9cae6cd0 RCX: 0000000000000000
[  125.467529] RDX: 0000000000000000 RSI: ffff98ac80bf1ed8 RDI: 0000000000000040
[  125.474715] RBP: ffff98ac9cae6cc8 R08: ffff98ac80bf1ed0 R09: fffffffffffffffe
[  125.481894] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[  125.489070] R13: 0000000000000001 R14: ffffffff8751f9c0 R15: 0000000000000004
[  125.496248] FS:  0000000000000000(0000) GS:ffff98b068749000(0000) knlGS:0000000000000000
[  125.504379] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  125.510166] CR2: 0000000087520483 CR3: 0000000122f01004 CR4: 0000000000f70ef0
[  125.517351] PKRU: 55555554
[  125.520097] Call Trace:
[  125.522590]  <TASK>
[  125.524733]  cpuset_reset_sched_domains+0x1e/0x30
[  125.529484]  sched_cpu_deactivate+0x11d/0x160
[  125.533894]  ? __pfx_sched_cpu_deactivate+0x10/0x10
[  125.538814]  cpuhp_invoke_callback+0x107/0x470
[  125.543305]  ? __pfx_smpboot_thread_fn+0x10/0x10
[  125.547971]  cpuhp_thread_fun+0xdb/0x160
[  125.551933]  smpboot_thread_fn+0xeb/0x220
[  125.555991]  kthread+0xf3/0x1f0
[  125.559174]  ? __pfx_kthread+0x10/0x10
[  125.562966]  ? __pfx_kthread+0x10/0x10
[  125.566758]  ret_from_fork+0x7d/0xd0
[  125.570383]  ? __pfx_kthread+0x10/0x10
[  125.574184]  ret_from_fork_asm+0x1a/0x30
[  125.578149]  </TASK>
[  125.580382] Modules linked in:
[  125.583485] CR2: 0000000087520483
[  125.586853] ---[ end trace 0000000000000000 ]---
[  125.591507] RIP: 0010:partition_sched_domains+0x36d/0x7f0
[  125.596954] Code: 02 00 48 8b 4d 00 41 bc 01 00 00 00 4c 89 c0 74 a0 b8 40 00 00 00 48 85 c9 74 05 f3 48 0f bc c1 48 98 48 8b 04 c5 e0 bb 85 86 <4e> 8b bc 30 c0 0a 00 00 8b 05 f5 8f 75 01 85 c0 0f 84 090
[  125.615763] RSP: 0018:ffffb185001dfd90 EFLAGS: 00010246
[  125.621032] RAX: 0000000100000003 RBX: ffff98ac9cae6cd0 RCX: 0000000000000000
[  125.628211] RDX: 0000000000000000 RSI: ffff98ac80bf1ed8 RDI: 0000000000000040
[  125.635390] RBP: ffff98ac9cae6cc8 R08: ffff98ac80bf1ed0 R09: fffffffffffffffe
[  125.642568] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[  125.649745] R13: 0000000000000001 R14: ffffffff8751f9c0 R15: 0000000000000004
[  125.656923] FS:  0000000000000000(0000) GS:ffff98b068749000(0000) knlGS:0000000000000000
[  125.665054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  125.670849] CR2: 0000000087520483 CR3: 0000000122f01004 CR4: 0000000000f70ef0
[  125.678026] PKRU: 55555554
[  125.680773] note: cpuhp/1[20] exited with irqs disabled

This happens because in order to offline the last secondary cpu, i.e. cpu 1,
build_sched_domains() ends up being passed an empty cpumask, since the only remaining
cpu (cpu 0) is isolated. It warns and fails, after which perf domains are
are attempted to be built, which crashes the kernel. The same problem occurs
during cpu hotplug, but that was fixed by
commit 38685e2 ("cpu/hotplug: Don't offline the last non-isolated CPU").

Fix this by ensuring that the primary cpu, the last standing cpu, is of domain
type, so that build_sched_domains() is not passed an empty cpumask.

Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
@linux-riscv-bot
Copy link
Copy Markdown
Author

Patch 1: "[RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type"
build-rv32-defconfig
Desc: Builds riscv32 defconfig
Duration: 101.89 seconds
Result: PASS

@linux-riscv-bot
Copy link
Copy Markdown
Author

Patch 1: "[RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type"
build-rv64-clang-allmodconfig
Desc: Builds riscv64 allmodconfig with Clang, and checks for errors and added warnings
Duration: 1012.27 seconds
Result: PASS

@linux-riscv-bot
Copy link
Copy Markdown
Author

Patch 1: "[RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type"
build-rv64-gcc-allmodconfig
Desc: Builds riscv64 allmodconfig with GCC, and checks for errors and added warnings
Duration: 1477.21 seconds
Result: PASS

@linux-riscv-bot
Copy link
Copy Markdown
Author

Patch 1: "[RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type"
build-rv64-nommu-k210-defconfig
Desc: Builds riscv64 defconfig with NOMMU for K210
Duration: 20.28 seconds
Result: PASS

@linux-riscv-bot
Copy link
Copy Markdown
Author

Patch 1: "[RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type"
build-rv64-nommu-k210-virt
Desc: Builds riscv64 defconfig with NOMMU for the virt platform
Duration: 21.74 seconds
Result: PASS

@linux-riscv-bot
Copy link
Copy Markdown
Author

Patch 1: "[RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type"
checkpatch
Desc: Runs checkpatch.pl on the patch
Duration: 1.77 seconds
Result: WARNING
Output:

CHECK: Alignment should match open parenthesis
#125: FILE: kernel/cpu.c:1906:
+		primary = cpumask_first_and_and(cpu_online_mask,
+								housekeeping_cpumask(HK_TYPE_TIMER),

WARNING: line length of 102 exceeds 100 columns
#126: FILE: kernel/cpu.c:1907:
+								housekeeping_cpumask(HK_TYPE_DOMAIN));

WARNING: line length of 102 exceeds 100 columns
#137: FILE: kernel/cpu.c:1916:
+								housekeeping_cpumask(HK_TYPE_DOMAIN));

CHECK: Alignment should match open parenthesis
#137: FILE: kernel/cpu.c:1916:
+			primary = cpumask_first_and(cpu_online_mask,
+								housekeeping_cpumask(HK_TYPE_DOMAIN));

WARNING: The commit message has 'Call Trace:', perhaps it also needs a 'Fixes:' tag?

total: 0 errors, 3 warnings, 2 checks, 40 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

Commit 0fe65c8bb028 ("kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type") has style problems, please review.

NOTE: Ignored message types: ALLOC_SIZEOF_STRUCT CAMELCASE COMMIT_LOG_LONG_LINE GIT_COMMIT_ID MACRO_ARG_REUSE NO_AUTHOR_SIGN_OFF

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.
total: 0 errors, 3 warnings, 2 checks, 40 lines checked
CHECK: Alignment should match open parenthesis
WARNING: The commit message has 'Call Trace:', perhaps it also needs a 'Fixes:' tag?
WARNING: line length of 102 exceeds 100 columns


@linux-riscv-bot
Copy link
Copy Markdown
Author

Patch 1: "[RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type"
dtb-warn-rv64
Desc: Checks for Device Tree warnings/errors
Duration: 71.52 seconds
Result: PASS

@linux-riscv-bot
Copy link
Copy Markdown
Author

Patch 1: "[RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type"
header-inline
Desc: Detects static functions without inline keyword in header files
Duration: 0.24 seconds
Result: PASS

@linux-riscv-bot
Copy link
Copy Markdown
Author

Patch 1: "[RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type"
kdoc
Desc: Detects for kdoc errors
Duration: 0.91 seconds
Result: PASS

@linux-riscv-bot
Copy link
Copy Markdown
Author

Patch 1: "[RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type"
module-param
Desc: Detect module_param changes
Duration: 0.26 seconds
Result: PASS

@linux-riscv-bot
Copy link
Copy Markdown
Author

Patch 1: "[RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type"
verify-fixes
Desc: Verifies that the Fixes: tags exist
Duration: 0.23 seconds
Result: PASS

@linux-riscv-bot
Copy link
Copy Markdown
Author

Patch 1: "[RFC] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type"
verify-signedoff
Desc: Verifies that Signed-off-by: tags are correct
Duration: 1.20 seconds
Result: ERROR
Output:

Commit 0fe65c8bb028 ("kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type")
	author Signed-off-by missing
	author email:    shashank.mahadasyam@sony.com
	committer email: linux.riscv.bot@gmail.com
	Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>

Errors in tree with Signed-off-by, please fix!


@linux-riscv-bot linux-riscv-bot force-pushed the workflow__riscv__fixes branch from a7cb30d to d776861 Compare July 2, 2025 18:46
@linux-riscv-bot linux-riscv-bot deleted the pw977217 branch July 8, 2025 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants