[PW_SID:1103915] arm64/riscv: Add support for crashkernel CMA reservation#2038
[PW_SID:1103915] arm64/riscv: Add support for crashkernel CMA reservation#2038linux-riscv-bot wants to merge 23 commits into
Conversation
As done in commit 944a45a ("arm64: kdump: Reimplement crashkernel=X") and commit 4831be7 ("arm64/kexec: Fix missing extra range for crashkres_low.") for arm64, while implementing crashkernel=X,[high,low], riscv should have excluded the "crashk_low_res" reserved ranges from the crash kernel memory to prevent them from being exported through /proc/vmcore, and the exclusion would need an extra crash_mem range. Just simply tested on qemu with crashkernel=4G with kexec in [1] mentioned in [2]. And the second kernel can be started normally. # dmesg | grep crash [ 0.000000] crashkernel low memory reserved: 0xf8000000 - 0x100000000 (128 MB) [ 0.000000] crashkernel reserved: 0x000000017fe00000 - 0x000000027fe00000 (4096 MB) Cc: Guo Ren <guoren@kernel.org> Cc: Baoquan He <bhe@redhat.com> [1]: https://github.com/chenjh005/kexec-tools/tree/build-test-riscv-v2 [2]: https://lore.kernel.org/all/20230726175000.2536220-1-chenjiahao16@huawei.com/ Fixes: 5882e5a ("riscv: kdump: Implement crashkernel=X,[high,low]") Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
In get_crash_memory_ranges(), if crash_exclude_mem_range() failed after realloc_mem_ranges() has successfully allocated the cmem memory, it just returns an error but leaves cmem pointing to the allocated memory, nor is it freed in the caller update_crash_elfcorehdr(), which cause a memory leak, goto out to free the cmem. Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Fixes: 849599b ("powerpc/crash: add crash memory hotplug support") Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
…ize_ppc64() A static Sashiko AI review identified a potential NULL pointer dereference in kexec_extra_fdt_size_ppc64(). When get_reserved_memory_ranges() successfully returns 0 on platforms without any reserved memory regions, the allocated 'rmem' pointer remains NULL. Passing this unallocated pointer directly to kexec_extra_fdt_size_ppc64() leads to a kernel panic when evaluating 'rmem->nr_ranges'. Fix this by adding a defensive NULL pointer check at the beginning of kexec_extra_fdt_size_ppc64(), returning 0 extra space immediately if no reserved memory structure exists. Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: stable@vger.kernel.org Fixes: 0d3ff06 ("powerpc/kexec_file: fix extra size calculation for kexec FDT") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
…ges() Sashiko AI review pointed out the following issue. The __merge_memory_ranges() function incorrectly handles overlapping memory ranges when merging them. Although sort_memory_ranges() sorts all ranges by their start address in ascending order beforehand, the merge logic remains defective in two ways: 1. It compares the current range's start against the previous element (i-1) instead of the running target index (idx) 2. It unconditionally overwrites 'ranges[idx].end' with 'ranges[i].end'. This logic flaw leads to critical memory truncation when a larger memory range completely subsumes subsequent smaller ranges. For example, consider a sorted input array with three ranges: Range A (idx=0): [0x1000 - 0x9000] Range B (i=1): [0x2000 - 0x5000] (completely inside Range A) Range C (i=2): [0x6000 - 0x8000] (completely inside Range A) 1. When i=1 (Range B): ranges[1].start (0x2000) <= ranges[0].end + 1 (0x9001) is TRUE. The code executes: ranges[0].end = ranges[1].end, which erroneously shrinks Range A's end from 0x9000 down to 0x5000. 2. When i=2 (Range C): ranges[2].start (0x6000) <= ranges[1].end + 1 (0x5001) is FALSE. The code falls into the else block, creating a broken new range. As a result, valid memory fragments [0x5001 - 0x5fff] and [0x8001 - 0x9000] are completely lost from the kexec exclude lists, potentially allowing the crash kernel to overwrite active memory, causing data corruption or crashes. Fix this by ensuring the start of the current range is compared against the end of the active merged range (idx), and use max() to safely prevent the outer boundary from being truncated. Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: stable@vger.kernel.org Fixes: 180adfc ("powerpc/kexec_file: Add helper functions for getting memory ranges") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
During a memory hot-remove event, the elfcorehdr is rebuilt to exclude the removed memory. While updating the crash memory ranges for this operation, the crash memory ranges array can become unsorted. This happens because remove_mem_range() may split a memory range into two parts and append the higher-address part as a separate range at the end of the array. So far, no issues have been observed due to the unsorted crash memory ranges. However, this could lead to problems once crash memory range removal is handled by generic code, as introduced in the upcoming patches in this series. Currently, powerpc uses a platform-specific function, remove_mem_range(), to exclude hot-removed memory from the crash memory ranges. This function performs the same task as the generic crash_exclude_mem_range() in crash_core.c. The generic helper also ensures that the crash memory ranges remain sorted. So remove the redundant powerpc-specific implementation and instead call crash_exclude_mem_range_guarded() (which internally calls crash_exclude_mem_range()) to exclude the hot-removed memory ranges. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan he <bhe@redhat.com> Cc: Jinjie Ruan <ruanjinjie@huawei.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Shivang Upadhyay <shivangu@linux.ibm.com> Cc: linux-kernel@vger.kernel.org Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The generic kimage_free_cma() relies on `image->nr_segments` to iterate and free allocated CMA pages. However, during architecture-specific segment placement retry loops (e.g., arm64's image_load()), a mid-way failure will truncate `image->nr_segments` back to its initial value. This truncation permanently hides any CMA pages allocated outside the new boundary from global cleanup, causing silent background memory leaks. To allow architecture-specific loaders to execute fine-grained memory reclamation before truncation occurs, extract the single-pass CMA release logic into a dedicated and exported helper: void kexec_free_segment_cma(struct kimage *image, unsigned long idx); Refactor the main kimage_free_cma() to invoke this helper sequentially to maintain backward compatibility while expanding single-slot flexibility. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
…loops Sashiko AI code review pointed out, during arm64 kexec image placement retry loops in image_load(), the loader repeatedly attempts to find a suitable memory hole for the kernel and its associated segments (initrd, dtb, etc.). When a placement attempt fails midway, the core framework rolls back `image->nr_segments` to its initial state to purge the failed segments logically. However, this truncation causes a severe background memory leak. Any CMA pages successfully allocated via kexec_add_buffer() during the failed attempt are recorded in the `image->segment_cma` array. Since the subsequent global kimage_free_cma() cleanup only iterates up to the truncated (smaller) `nr_segments` boundary, these allocated CMA pages outside the new boundary become completely orphaned and permanently leaked. Fix this by leverage the newly introduced generic kexec_free_segment_cma() helper to execute fine-grained memory reclamation before any truncation occurs: 1. In image_load(), explicitly invoke kexec_free_segment_cma() to release the CMA buffer allocated for the current failed kernel segment before decrementing `image->nr_segments`. 2. In the error path of load_other_segments(), iterate backward from the failed segment index down to `orig_segments`, sequentially freeing each orphan CMA segment allocation before restoring the initial segment count. This guarantees that all temporary CMA pages allocated during placement failures are cleanly returned to the contiguous memory allocator, eliminating silent background memory leaks across all retry paths. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Breno Leitao <leitao@debian.org> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Kees Cook <kees@kernel.org> Cc: "Rob Herring (Arm)" <robh@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Alexander Graf <graf@amazon.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: stable@vger.kernel.org Fixes: 07d2490 ("kexec: enable CMA based contiguous allocation") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Sashiko AI code review pointed out a potential memory leak of image->elf_headers when load_other_segments() fails on error paths. In the arm64 kexec_file file-load path, kexec_image.c runs a retry loop calling kexec_add_buffer() to find a suitable location for the kernel segment. On each iteration, load_other_segments() is invoked to allocate and populate alternative segments such as initrd, DTB, and ELF headers. However, if a placement or allocation failure occurs later in load_other_segments() (e.g., when adding initrd or dtb), the execution jumps to the out_err label. While this path restores image->nr_segments via orig_segments, it returns an error back to the caller without freeing the previously allocated image->elf_headers vmalloc buffer. As a result, the retry loop in image_load() unconditionally allocates new ELF headers on the next iteration and overwrites image->elf_headers, permanently leaking the memory blocks allocated in previous iterations. To fix this, decouple the ELF header allocation from the target-seeking retry loop. Since the contents and size of ELF headers only depend on the host memory layout and do not change with the kernel's physical placement, move prepare_elf_headers() completely outside and prior to the while retry loop in image_load(). And if kexec_add_buffer() for elf headers fails, not need to vfree headers, because the err path will vfree `image->elf_headers` by calling arch_kimage_file_post_load_cleanup(). This optimization eliminates redundant memory allocation/deallocation overhead during kexec placement retries and eradicates the Use-After-Free and memory leak risk. Concurrently, remove the prepare_elf_headers() call from inside load_other_segments() and have it directly reuse the single, pre-allocated image->elf_headers. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Huth <thuth@redhat.com> Cc: Breno Leitao <leitao@debian.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: Benjamin Gwin <bgwin@google.com> Cc: stable@vger.kernel.org Fixes: 108aa50 ("arm64: kexec_file: try more regions if loading segments fails") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
A static memory safety review by Sashiko AI identified a high-severity Use-After-Free (UAF) and Double Free vulnerability in the dm-crypt keys handling path during arm64 kexec image placement retry loops. In crash_load_dm_crypt_keys(), when the segment allocation fails via kexec_add_buffer(), the error path invokes `kvfree((void *)kbuf.buffer)` to reclaim the keys buffer. However, the global pointer `keys_header` is left dangling with a stale address, creating an insecure memory trap. When the top-level loader image_load() retries the next available placement hole, crash_load_dm_crypt_keys() is re-entered. Since `is_dm_key_reused` is a read-only global configuration managed by user-space configfs, it cannot be mutated by the kernel. If it remains true, the loader skips build_keys_header() and blindly reuses the stale `keys_header` pointer for kbuf.buffer, triggering a severe Use-After-Free or a Null pointer dereference during kexec_add_buffer(). Alternatively, a new headers build can trigger a recursive Double Free inside build_keys_header(). Fix this by setting the global `keys_header` to NULL immediately after it is freed in the failure path. Concurrently, upgrade the header regeneration check to a composite condition: `if (!is_dm_key_reused || !keys_header)` This ensures that if a previous retry attempt wiped the buffer, the kernel will automatically and safely trigger a fresh header regeneration internally without modifying the user-configured `is_dm_key_reused` state flag, achieving absolute data consistency and memory safety across all retry paths. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Dave Young <ruirui.yang@linux.dev> Cc: stable@vger.kernel.org Fixes: e3a84be ("arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
… safety Introduce CRASH_HOTPLUG_SAFETY_PADDING to allocate extra slots for the crash memory ranges array, mitigating potential TOCTOU races caused by concurrent memory hotplug events. When CONFIG_MEMORY_HOTPLUG is disabled, the padding safely defaults to 0 as the memory layout remains static. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Sashiko AI code review pointed out there is a TOCTOU (Time-of-Check to Time-of-Use) race condition in prepare_elf_headers() between the initial pass that counts System RAM ranges and the second pass that populates them. If a memory hotplug event occurs between these two steps, the number of memory regions may increase, causing an out-of-bounds write to the cmem->ranges[] array. Fix this fundamentally by using `CRASH_HOTPLUG_SAFETY_PADDING`(128 slots) to expand the flexible array allocation ceiling upfront. This safely absorbs any concurrent memory region expansion. Concurrently, add a defensive boundary check inside the callback to return -EAGAIN on unexpected overrun, fully eradicating the overflow window and ensuring system stability. Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: stable@vger.kernel.org Fixes: 8d5f894 ("x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Sashiko AI code review pointed out there is a TOCTOU (Time-of-Check to Time-of-Use) race condition in prepare_elf_headers() between the initial pass that counts System RAM ranges and the second pass that populates them. If a memory hotplug event occurs between these two steps, the number of memory regions may increase, causing an out-of-bounds write to the cmem->ranges[] array. Fix this fundamentally by using `CRASH_HOTPLUG_SAFETY_PADDING` (128 slots) to expand the flexible array allocation ceiling upfront. This safely absorbs any concurrent memory region expansion. Concurrently, add a defensive boundary check to return -EAGAIN on unexpected overrun, fully eradicating the overflow window and ensuring system stability. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com> Cc: Breno Leitao <leitao@debian.org> Cc: stable@vger.kernel.org Fixes: 3751e72 ("arm64: kexec_file: add crash dump support") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Sashiko AI code review pointed out there is a TOCTOU (Time-of-Check to Time-of-Use) race condition in prepare_elf_headers() between the initial pass that counts System RAM ranges and the second pass that populates them. If a memory hotplug event occurs between these two steps, the number of memory regions may increase, causing an out-of-bounds write to the cmem->ranges[] array. Fix this fundamentally by using `CRASH_HOTPLUG_SAFETY_PADDING` (128 slots) to expand the flexible array allocation ceiling upfront. This safely absorbs any concurrent memory region expansion. Concurrently, add a defensive boundary check inside the callback to return -EAGAIN on unexpected overrun, fully eradicating the overflow window and ensuring system stability. Cc: Paul Walmsley <pjw@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: songshuaishuai@tinylab.org Cc: bjorn@rivosinc.com Cc: leitao@debian.org Fixes: 8acea45 ("RISC-V: Support for kexec_file on panic") Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
…adding Sashiko AI code review pointed out there is a TOCTOU (Time-of-Check to Time-of-Use) race condition in prepare_elf_headers() between the initial pass that counts System RAM ranges and the second pass that populates them. If a memory hotplug event occurs between these two steps, the number of memory regions may increase, causing an out-of-bounds write to the cmem->ranges[] array. Fix this fundamentally by using `CRASH_HOTPLUG_SAFETY_PADDING` (128 slots) to expand the flexible array allocation ceiling upfront. This safely absorbs any concurrent memory region expansion. Concurrently, add a defensive boundary check to return -EAGAIN on unexpected overrun, fully eradicating the overflow window and ensuring system stability. Cc: Youling Tang <tangyouling@kylinos.cn> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: WANG Xuerui <kernel@xen0n.name> Cc: stable@vger.kernel.org Fixes: 1bcca86 ("LoongArch: Add crash dump support for kexec_file") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The crash memory alloc, and the exclude of crashk_res, crashk_low_res and crashk_cma memory are almost identical across different architectures, handling them in the crash core would eliminate a lot of duplication, so add crash_prepare_headers() helper to handle them in the common code. To achieve the above goal, three architecture-specific functions are introduced: - arch_get_system_nr_ranges(). Pre-counts the max number of memory ranges. - arch_crash_populate_cmem(). Collects the memory ranges and fills them into cmem. - arch_crash_exclude_ranges(). Architecture's additional crash memory ranges exclusion, defaulting to empty. Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Use the newly introduced crash_prepare_headers() function to replace the existing prepare_elf_headers(), allocate cmem and exclude crash kernel memory in the crash core, which reduce code duplication. Only the following two architecture functions need to be implemented: - arch_get_system_nr_ranges(). Use for_each_mem_range() to traverse and pre-count the max number of memory ranges. - arch_crash_populate_cmem(). Use for_each_mem_range to traverse and collect the memory ranges and fills them into cmem. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Use the newly introduced crash_prepare_headers() function to replace the existing prepare_elf_headers(), allocate cmem and exclude crash kernel memory in the crash core, which reduce code duplication. Only the following three architecture functions need to be implemented: - arch_get_system_nr_ranges(). Call get_nr_ram_ranges_callback() to pre-count the max number of memory ranges. - arch_crash_populate_cmem(). Use prepare_elf64_ram_headers_callback() to collect the memory ranges and fills them into cmem. - arch_crash_exclude_ranges(). Exclude the low 1M for x86. By the way, remove the unused "nr_mem_ranges" in arch_crash_handle_hotplug_event(). Cc: Thomas Gleixner <tglx@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Use the newly introduced crash_prepare_headers() function to replace the existing prepare_elf_headers(), allocate cmem and exclude crash kernel memory in the crash core, which reduce code duplication. Only the following two architecture functions need to be implemented: - arch_get_system_nr_ranges(). Call get_nr_ram_ranges_callback() to pre-counts the max number of memory ranges. - arch_crash_populate_cmem(). Use prepare_elf64_ram_headers_callback() to collects the memory ranges and fills them into cmem. Cc: Paul Walmsley <pjw@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
… code Use the newly introduced crash_prepare_headers() function to replace the existing prepare_elf_headers(), allocate cmem and exclude crash kernel memory in the crash core, which reduce code duplication. Only the following two architecture functions need to be implemented: - arch_get_system_nr_ranges(). Use for_each_mem_range to traverse and pre-count the max number of memory ranges. - arch_crash_populate_cmem(). Use for_each_mem_range to traverse and collect the memory ranges and fills them into cmem. Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Youling Tang <tangyouling@kylinos.cn> Cc: Baoquan He <bhe@redhat.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The crash memory exclude of crashk_res and crashk_cma memory on powerpc are almost identical to the generic crash_exclude_core_ranges(). By introducing the architecture-specific arch_crash_exclude_mem_range() function with a default implementation of crash_exclude_mem_range(), and using crash_exclude_mem_range_guarded as powerpc's separate implementation, the generic crash_exclude_core_ranges() helper function can be reused. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Shivang Upadhyay <shivangu@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Commit 35c18f2 ("Add a new optional ",cma" suffix to the crashkernel= command line option") and commit ab47551 ("kdump: implement reserve_crashkernel_cma") added CMA support for kdump crashkernel reservation. Crash kernel memory reservation wastes production resources if too large, risks kdump failure if too small, and faces allocation difficulties on fragmented systems due to contiguous block constraints. The new CMA-based crashkernel reservation scheme splits the "large fixed reservation" into a "small fixed region + large CMA dynamic region": the CMA memory is available to userspace during normal operation to avoid waste, and is reclaimed for kdump upon crash—saving memory while improving reliability. So extend crashkernel CMA reservation support to arm64. The following changes are made to enable CMA reservation: - Parse and obtain the CMA reservation size along with other crashkernel parameters. - Call reserve_crashkernel_cma() to allocate the CMA region for kdump. - Include the CMA-reserved ranges for kdump kernel to use. - Exclude the CMA-reserved ranges from the crash kernel memory to prevent them from being exported through /proc/vmcore, which is already done in the crash core. Update kernel-parameters.txt to document CMA support for crashkernel on arm64 architecture. Tested-by: Breno Leitao <leitao@debian.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Commit 35c18f2 ("Add a new optional ",cma" suffix to the crashkernel= command line option") and commit ab47551 ("kdump: implement reserve_crashkernel_cma") added CMA support for kdump crashkernel reservation. This allows the kernel to dynamically allocate contiguous memory for crash dumping when needed, rather than permanently reserving a fixed region at boot time. So extend crashkernel CMA reservation support to riscv. The following changes are made to enable CMA reservation: - Parse and obtain the CMA reservation size along with other crashkernel parameters. - Call reserve_crashkernel_cma() to allocate the CMA region for kdump. - Include the CMA-reserved ranges for kdump kernel to use, which was already done in of_kexec_alloc_and_setup_fdt(). - Exclude the CMA-reserved ranges from the crash kernel memory to prevent them from being exported through /proc/vmcore, which was already done in the crash core. Update kernel-parameters.txt to document CMA support for crashkernel on riscv architecture. Cc: Paul Walmsley <pjw@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Paul Walmsley <pjw@kernel.org> # arch/riscv Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Due to CPU/Memory hotplug or online/offline events, the elfcorehdr (which describes the CPUs and memory of the crashed kernel) of kdump image becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to inaccurate dump collection. The current solution to address the above issue involves monitoring the CPU/Memory add/remove events in userspace using udev rules and whenever there are changes in CPU and memory resources, the entire kdump image is loaded again. The kdump image includes kernel, initrd, elfcorehdr, FDT, purgatory. Given that only elfcorehdr gets outdated due to CPU/Memory add/remove events, reloading the entire kdump image is inefficient. More importantly, kdump remains inactive for a substantial amount of time until the kdump reload completes. To address the aforementioned issue, commit 2472627 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. In the event of a CPU or memory add/remove events, the generic crash hotplug event handler, crash_handle_hotplug_event(), is triggered. It then acquires the necessary locks to update the kdump image and invokes the architecture-specific crash hotplug handler, arch_crash_handle_hotplug_event(), to update the required kdump image components. [1] has supported virtual CPU hotplug in virtual machines for ARM64, allowing vCPUs to be added or removed at runtime to meet Kubernetes demands. On ARM64, only memory add/remove events are handled. Here's why: 1. Physical CPU hotplug: Not supported on ARM64 hardware. 2. ACPI vCPU hotplug (KVM virtual machine): - vCPU hotplug is implemented as a static firmware policy where all possible vCPUs are pre-described in the MADT table at boot. - The vCPU status will be automatically updated after vCPU hotplug. - No FDT or elfcorehdr update needed. 3. Device tree booted Virtual Machine vCPU hotplug: - The elfcorehdr is built using for_each_possible_cpu(), so it already includes all possible CPUs and doesn't need updates. For memory add/remove events, the elfcorehdr is updated to reflect the current memory layout. This patch adds the ARCH_SUPPORTS_CRASH_HOTPLUG config option and implements: - arch_crash_hotplug_support(): Check if hotplug update is supported - arch_crash_get_elfcorehdr_size(): Return elfcorehdr buffer size - arch_crash_handle_hotplug_event(): Handle memory hotplug events This follows the same approach as x86 commit ea53ad9 ("x86/crash: add x86 crash hotplug support") and powerpc commit b741092 ("powerpc/crash: add crash CPU hotplug support") and commit 849599b ("powerpc/crash: add crash memory hotplug support"). The test is based on the following QEMU version: https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v2 Replace your '-smp' argument with something like: | -smp cpus=1,maxcpus=3,cores=3,threads=1,sockets=1 then feed the following to the Qemu montior to hotplug vCPU; | (qemu) device_add driver=host-arm-cpu,core-id=1,id=cpu1 | (qemu) device_del cpu1 feed the following to the Qemu montior to hotplug memory; | (qemu) object_add memory-backend-ram,id=mem1,size=256M | (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 | (qemu) device_del dimm1 The qemu startup configuration is as follows: qemu-system-aarch64 \ -M virt,gic-version=3,acpi=on,highmem=on \ -enable-kvm \ -cpu host \ -kernel Image \ -smp cpus=1,maxcpus=3,cores=3,threads=1,sockets=1 \ -bios /usr/share/edk2/aarch64/QEMU_EFI.fd \ -m 2G,slots=64,maxmem=16G \ -nographic \ -no-reboot \ -device virtio-rng-pci \ -append "root=/dev/vda rw console=ttyAMA0 kgdboc=ttyAMA0,115200 \ earlycon acpi=on crashkernel=512M" \ -drive if=none,file=images/rootfs.ext4,format=raw,id=hd0 \ -device virtio-blk-device,drive=hd0 \ There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. Only kexec_file_load syscall way is tested now. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Breno Leitao <leitao@debian.org> Cc: Kees Cook <kees@kernel.org> [1]: https://lore.kernel.org/all/20240529133446.28446-1-Jonathan.Cameron@huawei.com/ Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
|
Patch 1: "[v15,01/23] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v15,01/23] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v15,01/23] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v15,01/23] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v15,01/23] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v15,01/23] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 1: "[v15,01/23] riscv: kexec_file: Fix crashk_low_res not exclude bug" |
|
Patch 17: "[v15,17/23] x86: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 17: "[v15,17/23] x86: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 17: "[v15,17/23] x86: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 17: "[v15,17/23] x86: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 17: "[v15,17/23] x86: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 18: "[v15,18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 18: "[v15,18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 18: "[v15,18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 18: "[v15,18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 18: "[v15,18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 18: "[v15,18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 18: "[v15,18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 18: "[v15,18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 18: "[v15,18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 18: "[v15,18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 18: "[v15,18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 18: "[v15,18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 19: "[v15,19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 19: "[v15,19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 19: "[v15,19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 19: "[v15,19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 19: "[v15,19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 19: "[v15,19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 19: "[v15,19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 19: "[v15,19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 19: "[v15,19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 19: "[v15,19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 19: "[v15,19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 19: "[v15,19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code" |
|
Patch 20: "[v15,20/23] powerpc/kexec_file: Use crash_exclude_core_ranges() helper" |
PR for series 1103915 applied to workflow__riscv__fixes
Name: arm64/riscv: Add support for crashkernel CMA reservation
URL: https://patchwork.kernel.org/project/linux-riscv/list/?series=1103915
Version: 15