[PW_SID:1092456] arm64/riscv: Add support for crashkernel CMA reservation#1901
[PW_SID:1092456] arm64/riscv: Add support for crashkernel CMA reservation#1901linux-riscv-bot wants to merge 16 commits into
Conversation
As done in commit 944a45a ("arm64: kdump: Reimplement crashkernel=X") and commit 4831be7 ("arm64/kexec: Fix missing extra range for crashkres_low.") for arm64, while implementing crashkernel=X,[high,low], riscv should have excluded the "crashk_low_res" reserved ranges from the crash kernel memory to prevent them from being exported through /proc/vmcore, and the exclusion would need an extra crash_mem range. Just simply tested on qemu with crashkernel=4G with kexec in [1] mentioned in [2]. And the second kernel can be started normally. # dmesg | grep crash [ 0.000000] crashkernel low memory reserved: 0xf8000000 - 0x100000000 (128 MB) [ 0.000000] crashkernel reserved: 0x000000017fe00000 - 0x000000027fe00000 (4096 MB) Cc: Guo Ren <guoren@kernel.org> Cc: Baoquan He <bhe@redhat.com> [1]: https://github.com/chenjh005/kexec-tools/tree/build-test-riscv-v2 [2]: https://lore.kernel.org/all/20230726175000.2536220-1-chenjiahao16@huawei.com/ Fixes: 5882e5a ("riscv: kdump: Implement crashkernel=X,[high,low]") Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
In get_crash_memory_ranges(), if crash_exclude_mem_range() failed after realloc_mem_ranges() has successfully allocated the cmem memory, it just returns an error but leaves cmem pointing to the allocated memory, nor is it freed in the caller update_crash_elfcorehdr(), which cause a memory leak, goto out to free the cmem. Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Fixes: 849599b ("powerpc/crash: add crash memory hotplug support") Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
There is a race condition between the kexec_load() system call (crash kernel loading path) and memory hotplug operations that can lead to buffer overflow and potential kernel crash. During prepare_elf_headers(), the following steps occur: 1. get_nr_ram_ranges_callback() queries current System RAM memory ranges 2. Allocates buffer based on queried count 3. prepare_elf64_ram_headers_callback() populates ranges from memblock If memory hotplug occurs between step 1 and step 3, the number of ranges can increase, causing out-of-bounds write when populating cmem->ranges[]. This happens because kexec_load() uses kexec_trylock (atomic_t) while memory hotplug uses device_hotplug_lock (mutex), so they don't serialize with each other. Since x86 supports crash hotplug, any data inconsistency caused by a race during the initial load will be corrected by the subsequent hotplug update. However, we must prevent a buffer overflow if the number of memory regions increases between the two passes. Add a boundary checking in prepare_elf64_ram_headers_callback() to ensure that the number of populated ranges does not exceed the allocated maximum. Cc: Thomas Gleixner <tglx@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: stable@vger.kernel.org Fixes: 8d5f894 ("x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
…rs() There is a race condition between the kexec_load() system call (crash kernel loading path) and memory hotplug operations that can lead to buffer overflow and potential kernel crash. During prepare_elf_headers(), the following steps occur: 1. The first for_each_mem_range() queries current System RAM memory ranges 2. Allocates buffer based on queried count 3. The 2st for_each_mem_range() populates ranges from memblock If memory hotplug occurs between step 1 and step 3, the number of ranges can increase, causing out-of-bounds write when populating cmem->ranges[]. This happens because kexec_load() uses kexec_trylock (atomic_t) while memory hotplug uses device_hotplug_lock (mutex), so they don't serialize with each other. Add the explicit bounds checking to prevent out-of-bounds access. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com> Cc: Breno Leitao <leitao@debian.org> Cc: stable@vger.kernel.org Fixes: 3751e72 ("arm64: kexec_file: add crash dump support") Closes: https://sashiko.dev/#/patchset/20260323072745.2481719-1-ruanjinjie%40huawei.com Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
…rs() There is a race condition between the kexec_load() system call (crash kernel loading path) and memory hotplug operations that can lead to buffer overflow and potential kernel crash. During prepare_elf_headers(), the following steps occur: 1. get_nr_ram_ranges_callback() queries current System RAM memory ranges 2. Allocates buffer based on queried count 3. prepare_elf64_ram_headers_callback() populates ranges from memblock If memory hotplug occurs between step 1 and step 3, the number of ranges can increase, causing out-of-bounds write when populating cmem->ranges[]. This happens because kexec_load() uses kexec_trylock (atomic_t) while memory hotplug uses device_hotplug_lock (mutex), so they don't serialize with each other. While this works today because RISC-V server hardware with hotplug support is still rare and most deployments use fixed memory configurations (e.g., QEMU virt machine), it is technically fragile. So add bounds checking in prepare_elf64_ram_headers_callback() to prevent out-of-bounds (OOB) access. No functional change for current RISC-V deployments, but makes the code robust against future hotplug-capable platforms. Cc: Paul Walmsley <pjw@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: songshuaishuai@tinylab.org Cc: bjorn@rivosinc.com Cc: leitao@debian.org Fixes: 8acea45 ("RISC-V: Support for kexec_file on panic") Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
There is a race condition between the kexec_load() system call (crash kernel loading path) and memory hotplug operations that can lead to buffer overflow and potential kernel crash. During prepare_elf_headers(), the following steps occur: 1. The first for_each_mem_range() queries current System RAM memory ranges 2. Allocates buffer based on queried count 3. The 2st for_each_mem_range() populates ranges from memblock If memory hotplug occurs between step 1 and step 3, the number of ranges can increase, causing out-of-bounds write when populating cmem->ranges[]. This happens because kexec_load() uses kexec_trylock (atomic_t) while memory hotplug uses device_hotplug_lock (mutex), so they don't serialize with each other. Just add bounds checking to prevent out-of-bounds access. Cc: Youling Tang <tangyouling@kylinos.cn> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: WANG Xuerui <kernel@xen0n.name> Cc: stable@vger.kernel.org Fixes: 1bcca86 ("LoongArch: Add crash dump support for kexec_file") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
During a memory hot-remove event, the elfcorehdr is rebuilt to exclude the removed memory. While updating the crash memory ranges for this operation, the crash memory ranges array can become unsorted. This happens because remove_mem_range() may split a memory range into two parts and append the higher-address part as a separate range at the end of the array. So far, no issues have been observed due to the unsorted crash memory ranges. However, this could lead to problems once crash memory range removal is handled by generic code, as introduced in the upcoming patches in this series. Currently, powerpc uses a platform-specific function, remove_mem_range(), to exclude hot-removed memory from the crash memory ranges. This function performs the same task as the generic crash_exclude_mem_range() in crash_core.c. The generic helper also ensures that the crash memory ranges remain sorted. So remove the redundant powerpc-specific implementation and instead call crash_exclude_mem_range_guarded() (which internally calls crash_exclude_mem_range()) to exclude the hot-removed memory ranges. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan he <bhe@redhat.com> Cc: Jinjie Ruan <ruanjinjie@huawei.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Shivang Upadhyay <shivangu@linux.ibm.com> Cc: linux-kernel@vger.kernel.org Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The crash memory alloc, and the exclude of crashk_res, crashk_low_res and crashk_cma memory are almost identical across different architectures, handling them in the crash core would eliminate a lot of duplication, so add crash_prepare_headers() helper to handle them in the common code. To achieve the above goal, three architecture-specific functions are introduced: - arch_get_system_nr_ranges(). Pre-counts the max number of memory ranges. - arch_crash_populate_cmem(). Collects the memory ranges and fills them into cmem. - arch_crash_exclude_ranges(). Architecture's additional crash memory ranges exclusion, defaulting to empty. Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Use the newly introduced crash_prepare_headers() function to replace the existing prepare_elf_headers(), allocate cmem and exclude crash kernel memory in the crash core, which reduce code duplication. Only the following two architecture functions need to be implemented: - arch_get_system_nr_ranges(). Use for_each_mem_range() to traverse and pre-count the max number of memory ranges. - arch_crash_populate_cmem(). Use for_each_mem_range to traverse and collect the memory ranges and fills them into cmem. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Use the newly introduced crash_prepare_headers() function to replace the existing prepare_elf_headers(), allocate cmem and exclude crash kernel memory in the crash core, which reduce code duplication. Only the following three architecture functions need to be implemented: - arch_get_system_nr_ranges(). Call get_nr_ram_ranges_callback() to pre-count the max number of memory ranges. - arch_crash_populate_cmem(). Use prepare_elf64_ram_headers_callback() to collect the memory ranges and fills them into cmem. - arch_crash_exclude_ranges(). Exclude the low 1M for x86. By the way, remove the unused "nr_mem_ranges" in arch_crash_handle_hotplug_event(). Cc: Thomas Gleixner <tglx@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Use the newly introduced crash_prepare_headers() function to replace the existing prepare_elf_headers(), allocate cmem and exclude crash kernel memory in the crash core, which reduce code duplication. Only the following two architecture functions need to be implemented: - arch_get_system_nr_ranges(). Call get_nr_ram_ranges_callback() to pre-counts the max number of memory ranges. - arch_crash_populate_cmem(). Use prepare_elf64_ram_headers_callback() to collects the memory ranges and fills them into cmem. Cc: Paul Walmsley <pjw@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Use the newly introduced crash_prepare_headers() function to replace the existing prepare_elf_headers(), allocate cmem and exclude crash kernel memory in the crash core, which reduce code duplication. Only the following two architecture functions need to be implemented: - arch_get_system_nr_ranges(). Use for_each_mem_range to traverse and pre-count the max number of memory ranges. - arch_crash_populate_cmem(). Use for_each_mem_range to traverse and collect the memory ranges and fills them into cmem. Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Youling Tang <tangyouling@kylinos.cn> Cc: Baoquan He <bhe@redhat.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The crash memory exclude of crashk_res and crashk_cma memory on powerpc are almost identical to the generic crash_exclude_core_ranges(). By introducing the architecture-specific arch_crash_exclude_mem_range() function with a default implementation of crash_exclude_mem_range(), and using crash_exclude_mem_range_guarded as powerpc's separate implementation, the generic crash_exclude_core_ranges() helper function can be reused. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Shivang Upadhyay <shivangu@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Commit 35c18f2 ("Add a new optional ",cma" suffix to the crashkernel= command line option") and commit ab47551 ("kdump: implement reserve_crashkernel_cma") added CMA support for kdump crashkernel reservation. Crash kernel memory reservation wastes production resources if too large, risks kdump failure if too small, and faces allocation difficulties on fragmented systems due to contiguous block constraints. The new CMA-based crashkernel reservation scheme splits the "large fixed reservation" into a "small fixed region + large CMA dynamic region": the CMA memory is available to userspace during normal operation to avoid waste, and is reclaimed for kdump upon crash—saving memory while improving reliability. So extend crashkernel CMA reservation support to arm64. The following changes are made to enable CMA reservation: - Parse and obtain the CMA reservation size along with other crashkernel parameters. - Call reserve_crashkernel_cma() to allocate the CMA region for kdump. - Include the CMA-reserved ranges for kdump kernel to use. - Exclude the CMA-reserved ranges from the crash kernel memory to prevent them from being exported through /proc/vmcore, which is already done in the crash core. Update kernel-parameters.txt to document CMA support for crashkernel on arm64 architecture. Tested-by: Breno Leitao <leitao@debian.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Commit 35c18f2 ("Add a new optional ",cma" suffix to the crashkernel= command line option") and commit ab47551 ("kdump: implement reserve_crashkernel_cma") added CMA support for kdump crashkernel reservation. This allows the kernel to dynamically allocate contiguous memory for crash dumping when needed, rather than permanently reserving a fixed region at boot time. So extend crashkernel CMA reservation support to riscv. The following changes are made to enable CMA reservation: - Parse and obtain the CMA reservation size along with other crashkernel parameters. - Call reserve_crashkernel_cma() to allocate the CMA region for kdump. - Include the CMA-reserved ranges for kdump kernel to use, which was already done in of_kexec_alloc_and_setup_fdt(). - Exclude the CMA-reserved ranges from the crash kernel memory to prevent them from being exported through /proc/vmcore, which was already done in the crash core. Update kernel-parameters.txt to document CMA support for crashkernel on riscv architecture. Cc: Paul Walmsley <pjw@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Paul Walmsley <pjw@kernel.org> # arch/riscv Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
|
Patch 1: "[v13,01/15] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v13,01/15] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v13,01/15] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v13,01/15] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v13,01/15] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v13,01/15] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v13,01/15] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v13,01/15] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v13,01/15] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v13,01/15] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v13,01/15] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v13,01/15] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 2: "[v13,02/15] powerpc/crash: Fix possible memory leak in update_crash_elfcorehdr()" |
|
Patch 2: "[v13,02/15] powerpc/crash: Fix possible memory leak in update_crash_elfcorehdr()" |
|
Patch 13: "[v13,13/15] crash: Use crash_exclude_core_ranges() on powerpc" |
|
Patch 13: "[v13,13/15] crash: Use crash_exclude_core_ranges() on powerpc" |
|
Patch 13: "[v13,13/15] crash: Use crash_exclude_core_ranges() on powerpc" |
|
Patch 14: "[v13,14/15] arm64: kexec: Add support for crashkernel CMA reservation" |
|
Patch 14: "[v13,14/15] arm64: kexec: Add support for crashkernel CMA reservation" |
|
Patch 14: "[v13,14/15] arm64: kexec: Add support for crashkernel CMA reservation" |
|
Patch 14: "[v13,14/15] arm64: kexec: Add support for crashkernel CMA reservation" |
|
Patch 14: "[v13,14/15] arm64: kexec: Add support for crashkernel CMA reservation" |
|
Patch 14: "[v13,14/15] arm64: kexec: Add support for crashkernel CMA reservation" |
|
Patch 14: "[v13,14/15] arm64: kexec: Add support for crashkernel CMA reservation" |
|
Patch 14: "[v13,14/15] arm64: kexec: Add support for crashkernel CMA reservation" |
|
Patch 14: "[v13,14/15] arm64: kexec: Add support for crashkernel CMA reservation" |
|
Patch 14: "[v13,14/15] arm64: kexec: Add support for crashkernel CMA reservation" |
|
Patch 14: "[v13,14/15] arm64: kexec: Add support for crashkernel CMA reservation" |
|
Patch 14: "[v13,14/15] arm64: kexec: Add support for crashkernel CMA reservation" |
|
Patch 15: "[v13,15/15] riscv: kexec: Add support for crashkernel CMA reservation" |
|
Patch 15: "[v13,15/15] riscv: kexec: Add support for crashkernel CMA reservation" |
|
Patch 15: "[v13,15/15] riscv: kexec: Add support for crashkernel CMA reservation" |
|
Patch 15: "[v13,15/15] riscv: kexec: Add support for crashkernel CMA reservation" |
|
Patch 15: "[v13,15/15] riscv: kexec: Add support for crashkernel CMA reservation" |
|
Patch 15: "[v13,15/15] riscv: kexec: Add support for crashkernel CMA reservation" |
|
Patch 15: "[v13,15/15] riscv: kexec: Add support for crashkernel CMA reservation" |
|
Patch 15: "[v13,15/15] riscv: kexec: Add support for crashkernel CMA reservation" |
|
Patch 15: "[v13,15/15] riscv: kexec: Add support for crashkernel CMA reservation" |
|
Patch 15: "[v13,15/15] riscv: kexec: Add support for crashkernel CMA reservation" |
|
Patch 15: "[v13,15/15] riscv: kexec: Add support for crashkernel CMA reservation" |
|
Patch 15: "[v13,15/15] riscv: kexec: Add support for crashkernel CMA reservation" |
2d4fcdd to
cd9d421
Compare
PR for series 1092456 applied to workflow__riscv__fixes
Name: arm64/riscv: Add support for crashkernel CMA reservation
URL: https://patchwork.kernel.org/project/linux-riscv/list/?series=1092456
Version: 13