[PW_SID:1059227] mm: Eliminate fake head pages from vmemmap optimization#1521
[PW_SID:1059227] mm: Eliminate fake head pages from vmemmap optimization#1521linux-riscv-bot wants to merge 18 commits into
Conversation
Move MAX_FOLIO_ORDER definition from mm.h to mmzone.h. This is preparation for adding the vmemmap_tails array to struct zone, which requires MAX_FOLIO_ORDER to be available in mmzone.h. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Muchun Song <muchun.song@linux.dev> Acked-by: Usama Arif <usamaarif642@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Instead of passing down the head page and tail page index, pass the tail and head pages directly, as well as the order of the compound page. This is a preparation for changing how the head position is encoded in the tail page. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (arm) <david@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
…d_info' The 'compound_head' field in the 'struct page' encodes whether the page is a tail and where to locate the head page. Bit 0 is set if the page is a tail, and the remaining bits in the field point to the head page. As preparation for changing how the field encodes information about the head page, rename the field to 'compound_info'. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (arm) <david@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Move set_compound_head() and clear_compound_head() to be adjacent to the compound_head() function in page-flags.h. These functions encode and decode the same compound_info field, so keeping them together makes it easier to verify their logic is consistent, especially when the encoding changes. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (arm) <david@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The upcoming change to the HugeTLB vmemmap optimization (HVO) requires struct pages of the head page to be naturally aligned with regard to the folio size. Align vmemmap to the newly introduced MAX_FOLIO_VMEMMAP_ALIGN. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The upcoming change to the HugeTLB vmemmap optimization (HVO) requires struct pages of the head page to be naturally aligned with regard to the folio size. Align vmemmap to MAX_FOLIO_VMEMMAP_ALIGN. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
For tail pages, the kernel uses the 'compound_info' field to get to the head page. The bit 0 of the field indicates whether the page is a tail page, and if set, the remaining bits represent a pointer to the head page. For cases when size of struct page is power-of-2, change the encoding of compound_info to store a mask that can be applied to the virtual address of the tail page in order to access the head page. It is possible because struct page of the head page is naturally aligned with regards to order of the page. The significant impact of this modification is that all tail pages of the same order will now have identical 'compound_info', regardless of the compound page they are associated with. This paves the way for eliminating fake heads. The HugeTLB Vmemmap Optimization (HVO) creates fake heads and it is only applied when the sizeof(struct page) is power-of-2. Having identical tail pages allows the same page to be mapped into the vmemmap of all pages, maintaining memory savings without fake heads. If sizeof(struct page) is not power-of-2, there is no functional changes. Limit mask usage to HugeTLB vmemmap optimization (HVO) where it makes a difference. The approach with mask would work in the wider set of conditions, but it requires validating that struct pages are naturally aligned for all orders up to the MAX_FOLIO_ORDER, which can be tricky. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Usama Arif <usamaarif642@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
If page->compound_info encodes a mask, it is expected that vmemmap to be naturally aligned to the maximum folio size. Add a VM_WARN_ON_ONCE() to check the alignment. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Currently, the vmemmap for bootmem-allocated gigantic pages is populated early in hugetlb_vmemmap_init_early(). However, the zone information is only available after zones are initialized. If it is later discovered that a page spans multiple zones, the HVO mapping must be undone and replaced with a normal mapping using vmemmap_undo_hvo(). Defer the actual vmemmap population to hugetlb_vmemmap_init_late(). At this stage, zones are already initialized, so it can be checked if the page is valid for HVO before deciding how to populate the vmemmap. This allows us to remove vmemmap_undo_hvo() and the complex logic required to rollback HVO mappings. In hugetlb_vmemmap_init_late(), if HVO population fails or if the zones are invalid, fall back to a normal vmemmap population. Postponing population until hugetlb_vmemmap_init_late() also makes zone information available from within vmemmap_populate_hvo(). Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
To prepare for removing fake head pages, the vmemmap_walk code is being reworked. The reuse_page and reuse_addr variables are being eliminated. There will no longer be an expectation regarding the reuse address in relation to the operated range. Instead, the caller will provide head and tail vmemmap pages. Currently, vmemmap_head and vmemmap_tail are set to the same page, but this will change in the future. The only functional change is that __hugetlb_vmemmap_optimize_folio() will abandon optimization if memory allocation fails. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Muchun Song <muchun.song@linux.dev> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The 32-bit VDSO build on x86_64 uses fake_32bit_build.h to undefine various kernel configuration options that are not suitable for the VDSO context or may cause build issues when including kernel headers. Undefine CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP in fake_32bit_build.h to prepare for change in HugeTLB Vmemmap Optimization. Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
HugeTLB Vmemmap Optimization (HVO) reduces memory usage by freeing most vmemmap pages for huge pages and remapping the freed range to a single page containing the struct page metadata. With the new mask-based compound_info encoding (for power-of-2 struct page sizes), all tail pages of the same order are now identical regardless of which compound page they belong to. This means the tail pages can be truly shared without fake heads. Allocate a single page of initialized tail struct pages per zone per order in the vmemmap_tails[] array in struct zone. All huge pages of that order in the zone share this tail page, mapped read-only into their vmemmap. The head page remains unique per huge page. Redefine MAX_FOLIO_ORDER using ilog2(). The define has to produce a compile-constant as it is used to specify vmemmap_tail array size. For some reason, compiler is not able to solve get_order() at compile-time, but ilog2() works. Avoid PUD_ORDER to define MAX_FOLIO_ORDER as it adds dependency to <linux/pgtable.h> which generates hard-to-break include loop. This eliminates fake heads while maintaining the same memory savings, and simplifies compound_head() by removing fake head detection. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
With fake head pages eliminated in the previous commit, remove the supporting infrastructure: - page_fixed_fake_head(): no longer needed to detect fake heads; - page_is_fake_head(): no longer needed; - page_count_writable(): no longer needed for RCU protection; - RCU read_lock in page_ref_add_unless(): no longer needed; This substantially simplifies compound_head() and page_ref_add_unless(), removing both branches and RCU overhead from these hot paths. RCU was required to serialize allocation of hugetlb page against get_page_unless_zero() and prevent writing to read-only fake head. It is redundant without fake heads. See bd22553 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") for more details. synchronize_rcu() in mm/hugetlb_vmemmap.c will be removed by a separate patch. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Muchun Song <muchun.song@linux.dev> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The VMEMMAP_SYNCHRONIZE_RCU flag triggered synchronize_rcu() calls to prevent a race between HVO remapping and page_ref_add_unless(). The race could occur when a speculative PFN walker tried to modify the refcount on a struct page that was in the process of being remapped to a fake head. With fake heads eliminated, page_ref_add_unless() no longer needs RCU protection. Remove the flag and synchronize_rcu() calls. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The hugetlb_optimize_vmemmap_key static key was used to guard fake head detection in compound_head() and related functions. It allowed skipping the fake head checks entirely when HVO was not in use. With fake heads eliminated and the detection code removed, the static key serves no purpose. Remove its definition and all increment/decrement calls. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Muchun Song <muchun.song@linux.dev> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The compound_head() function is a hot path. For example, the zap path calls it for every leaf page table entry. Rewrite the helper function in a branchless manner to eliminate the risk of CPU branch misprediction. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Update the documentation regarding vmemmap optimization for hugetlb to reflect the changes in how the kernel maps the tail pages. Fake heads no longer exist. Remove their description. Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
page_slab() contained an open-coded implementation of compound_head(). Replace the duplicated code with a direct call to compound_head(). Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
|
Patch 1: "[PATCHv7,01/18] mm: Move MAX_FOLIO_ORDER definition to mmzone.h" |
|
Patch 1: "[PATCHv7,01/18] mm: Move MAX_FOLIO_ORDER definition to mmzone.h" |
|
Patch 1: "[PATCHv7,01/18] mm: Move MAX_FOLIO_ORDER definition to mmzone.h" |
|
Patch 1: "[PATCHv7,01/18] mm: Move MAX_FOLIO_ORDER definition to mmzone.h" |
|
Patch 1: "[PATCHv7,01/18] mm: Move MAX_FOLIO_ORDER definition to mmzone.h" |
|
Patch 1: "[PATCHv7,01/18] mm: Move MAX_FOLIO_ORDER definition to mmzone.h" |
|
Patch 1: "[PATCHv7,01/18] mm: Move MAX_FOLIO_ORDER definition to mmzone.h" |
|
Patch 1: "[PATCHv7,01/18] mm: Move MAX_FOLIO_ORDER definition to mmzone.h" |
|
Patch 1: "[PATCHv7,01/18] mm: Move MAX_FOLIO_ORDER definition to mmzone.h" |
|
Patch 1: "[PATCHv7,01/18] mm: Move MAX_FOLIO_ORDER definition to mmzone.h" |
|
Patch 1: "[PATCHv7,01/18] mm: Move MAX_FOLIO_ORDER definition to mmzone.h" |
|
Patch 1: "[PATCHv7,01/18] mm: Move MAX_FOLIO_ORDER definition to mmzone.h" |
|
Patch 16: "[PATCHv7,16/18] mm: Remove the branch from compound_head()" |
|
Patch 16: "[PATCHv7,16/18] mm: Remove the branch from compound_head()" |
|
Patch 16: "[PATCHv7,16/18] mm: Remove the branch from compound_head()" |
|
Patch 16: "[PATCHv7,16/18] mm: Remove the branch from compound_head()" |
|
Patch 17: "[PATCHv7,17/18] hugetlb: Update vmemmap_dedup.rst" |
|
Patch 17: "[PATCHv7,17/18] hugetlb: Update vmemmap_dedup.rst" |
|
Patch 17: "[PATCHv7,17/18] hugetlb: Update vmemmap_dedup.rst" |
|
Patch 17: "[PATCHv7,17/18] hugetlb: Update vmemmap_dedup.rst" |
|
Patch 17: "[PATCHv7,17/18] hugetlb: Update vmemmap_dedup.rst" |
|
Patch 17: "[PATCHv7,17/18] hugetlb: Update vmemmap_dedup.rst" |
|
Patch 17: "[PATCHv7,17/18] hugetlb: Update vmemmap_dedup.rst" |
|
Patch 17: "[PATCHv7,17/18] hugetlb: Update vmemmap_dedup.rst" |
|
Patch 17: "[PATCHv7,17/18] hugetlb: Update vmemmap_dedup.rst" |
|
Patch 17: "[PATCHv7,17/18] hugetlb: Update vmemmap_dedup.rst" |
|
Patch 17: "[PATCHv7,17/18] hugetlb: Update vmemmap_dedup.rst" |
|
Patch 17: "[PATCHv7,17/18] hugetlb: Update vmemmap_dedup.rst" |
|
Patch 18: "[PATCHv7,18/18] mm/slab: Use compound_head() in page_slab()" |
|
Patch 18: "[PATCHv7,18/18] mm/slab: Use compound_head() in page_slab()" |
|
Patch 18: "[PATCHv7,18/18] mm/slab: Use compound_head() in page_slab()" |
|
Patch 18: "[PATCHv7,18/18] mm/slab: Use compound_head() in page_slab()" |
|
Patch 18: "[PATCHv7,18/18] mm/slab: Use compound_head() in page_slab()" |
|
Patch 18: "[PATCHv7,18/18] mm/slab: Use compound_head() in page_slab()" |
|
Patch 18: "[PATCHv7,18/18] mm/slab: Use compound_head() in page_slab()" |
|
Patch 18: "[PATCHv7,18/18] mm/slab: Use compound_head() in page_slab()" |
|
Patch 18: "[PATCHv7,18/18] mm/slab: Use compound_head() in page_slab()" |
|
Patch 18: "[PATCHv7,18/18] mm/slab: Use compound_head() in page_slab()" |
|
Patch 18: "[PATCHv7,18/18] mm/slab: Use compound_head() in page_slab()" |
|
Patch 18: "[PATCHv7,18/18] mm/slab: Use compound_head() in page_slab()" |
PR for series 1059227 applied to workflow__riscv__fixes
Name: mm: Eliminate fake head pages from vmemmap optimization
URL: https://patchwork.kernel.org/project/linux-riscv/list/?series=1059227
Version: 1