Skip to content

Enable CONFIG_SECURITY_SELINUX_AVC_STATS option#44

Open
nischay12089 wants to merge 1660 commits into
re-noroi:stablefrom
nischay12089:patch-1
Open

Enable CONFIG_SECURITY_SELINUX_AVC_STATS option#44
nischay12089 wants to merge 1660 commits into
re-noroi:stablefrom
nischay12089:patch-1

Conversation

@nischay12089

Copy link
Copy Markdown

No description provided.

Peter Zijlstra and others added 30 commits September 8, 2025 21:22
Generic mmu_gather provides everything that ARM needs:

 - range tracking
 - RCU table free
 - VM_EXEC tracking
 - VIPT cache flushing

The one notable curiosity is the 'funny' range tracking for classical
ARM in __pte_free_tlb().

No change in behavior intended.

Change-Id: Ic9790b51c91679351c24eab945c3cc31b9764b67
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Generic mmu_gather provides everything ia64 needs (range tracking).

No change in behavior intended.

Change-Id: I809c3a71624b02c949d8dda7fa456e9680b684d5
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Generic mmu_gather provides everything SH needs (range tracking and
cache coherency).

No change in behavior intended.

Change-Id: I6703255f9470a015616dacd6d13fc187cdf21d83
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Generic mmu_gather provides the simple flush_tlb_range() based range
tracking mmu_gather UM needs.

No change in behavior intended.

Change-Id: Ibace9d110d9c3aa9c5d8148547e6b8871e8044fc
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For the architectures that do not implement their own tlb_flush() but
do already use the generic mmu_gather, there are two options:

 1) the platform has an efficient flush_tlb_range() and
    asm-generic/tlb.h doesn't need any overrides at all.

 2) the platform lacks an efficient flush_tlb_range() and
    we select MMU_GATHER_NO_RANGE to minimize full invalidates.

Convert all 'simple' architectures to one of these two forms.

alpha:	    has no range invalidate -> 2
arc:	    already used flush_tlb_range() -> 1
c6x:	    has no range invalidate -> 2
hexagon:    has an efficient flush_tlb_range() -> 1
            (flush_tlb_mm() is in fact a full range invalidate,
	     so no need to shoot down everything)
m68k:	    has inefficient flush_tlb_range() -> 2
microblaze: has no flush_tlb_range() -> 2
mips:	    has efficient flush_tlb_range() -> 1
	    (even though it currently seems to use flush_tlb_mm())
nds32:	    already uses flush_tlb_range() -> 1
nios2:	    has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
openrisc:   has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
parisc:	    already uses flush_tlb_range() -> 1
sparc32:    already uses flush_tlb_range() -> 1
unicore32:  has inefficient flush_tlb_range() -> 2
	    (no limit on range iteration)
xtensa:	    has efficient flush_tlb_range() -> 1

Note this also fixes a bug in the existing code for a number
platforms. Those platforms that did:

  tlb_end_vma() -> if (!full_mm) flush_tlb_*()
  tlb_flush -> if (full_mm) flush_tlb_mm()

missed the case of shift_arg_pages(), which doesn't have @fullmm set,
nor calls into tlb_*vma(), but still frees page-tables and thus needs
an invalidate. The new code handles this by detecting a non-empty
range, and either issuing the matching range invalidate or a full
invalidate, depending on the capabilities.

No change in behavior intended.

Change-Id: Id4431b710562c1e46fb3a8cdde5d19e26c0ba2f9
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the Kconfig option HAVE_MMU_GATHER_NO_GATHER to the generic
mmu_gather code. If the option is set the mmu_gather will not
track individual pages for delayed page free anymore. A platform
that enables the option needs to provide its own implementation
of the __tlb_remove_page_size() function to free pages.

No change in behavior intended.

Change-Id: Ia3b0ec2e30fbceb6e7d133ba5fa46c3c05d5a3cc
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: linux@armlinux.org.uk
Cc: npiggin@gmail.com
Link: http://lkml.kernel.org/r/20180918125151.31744-2-schwidefsky@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
No change in behavior intended.

Change-Id: I06d9934565a3ef9b4a24b7af7530054a7101feaa
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: linux@armlinux.org.uk
Cc: npiggin@gmail.com
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/20180918125151.31744-3-schwidefsky@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that all architectures are converted to the generic code, remove
the arch hooks.

No change in behavior intended.

Change-Id: I6d927763b6d9d6522892483484a96ba5bd20dc43
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since all architectures are now using it, it is redundant.

Change-Id: Ic598909ce12e8187037c4ae93f0d6aa183125856
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As the comment notes; it is a potentially dangerous operation. Just
use tlb_flush_mmu(), that will skip the (double) TLB invalidate if
it really isn't needed anyway.

No change in behavior intended.

Change-Id: Id47600e5c233599de020ff095e8433d25d92e1b2
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are no external users of this API (nor should there be); remove it.

Change-Id: Idbf00e6259cf40a2d2f40de8819681d6a65b1c38
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Only ia64-sn2 uses this as an optimization, and there it is of
questionable correctness due to the mm_users==1 test.

Remove it entirely.

No change in behavior intended.

Change-Id: Ie168881dee6198ae51c03bd98c9cebb7414d415a
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Switch from per mm_struct to per pmd page table lock by enabling
ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
large system.

I'm not sure if there is contention on mm->page_table_lock. Given
the option comes at no cost (apart from initializing more spin
locks), why not enable it now.

We only do so when pmd is not folded, so we don't mistakenly call
pgtable_pmd_page_ctor() on pud or p4d in pgd_pgtable_alloc().

Change-Id: I9645aa6ffbf390b2c789e0fd86f07708a30c891b
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
x86 has an nmi_uaccess_okay(), but other architectures do not.
Arch-independent code might need to know whether access to user
addresses is ok in an NMI context or in other code whose execution
context is unknown.  Specifically, this function is needed for
bpf_probe_write_user().

Add a default implementation of nmi_uaccess_okay() for architectures
that do not have such a function.

Change-Id: Ic5a8fdc65830c29f70d7a64c8d8c9328b8f95a3e
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-23-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
…ned to stride

Since commit 3d65b6bbc01e ("arm64: tlbi: Set MAX_TLBI_OPS to
PTRS_PER_PTE"), we resort to per-ASID invalidation when attempting to
perform more than PTRS_PER_PTE invalidation instructions in a single
call to __flush_tlb_range(). Whilst this is beneficial, the mmu_gather
code does not ensure that the end address of the range is rounded-up
to the stride when freeing intermediate page tables in pXX_free_tlb(),
which defeats our range checking.

Align the bounds passed into __flush_tlb_range().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Hanjun Guo <guohanjun@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Change-Id: Iffe236ca072fc0273a60f1fce05079b952d67eac
Signed-off-by: Will Deacon <will.deacon@arm.com>
… failure and flush

commit 0ed1325967ab5f7a4549a2641c6ebe115f76e228 upstream.

Architectures for which we have hardware walkers of Linux page table
should flush TLB on mmu gather batch allocation failures and batch flush.
Some architectures like POWER supports multiple translation modes (hash
and radix) and in the case of POWER only radix translation mode needs the
above TLBI.  This is because for hash translation mode kernel wants to
avoid this extra flush since there are no hardware walkers of linux page
table.  With radix translation, the hardware also walks linux page table
and with that, kernel needs to make sure to TLB invalidate page walk cache
before page table pages are freed.

More details in commit d86564a ("mm/tlb, x86/mm: Support invalidating
TLB caches for RCU_TABLE_FREE")

The changes to sparc are to make sure we keep the old behavior since we
are now removing HAVE_RCU_TABLE_NO_INVALIDATE.  The default value for
tlb_needs_table_invalidate is to always force an invalidate and sparc can
avoid the table invalidate.  Hence we define tlb_needs_table_invalidate to
false for sparc architecture.

Link: http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.kumar@linux.ibm.com
Fixes: a46cc7a ("powerpc/mm/radix: Improve TLB/PWC flushes")
Change-Id: I619a0c32a148c2fe1143e25e9b24c85a06be5b8c
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2631ed00b0498810f8d5c2163c6b5270d893687b upstream.

tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and
tlb->end, then set corresponding cleared_*.

Change-Id: I060c43f72b8896c0c5c7216d4ffeb2bb0ed8e989
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200625080314.230-5-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4a118f2eead1d6c49e00765de89878288d4b890 upstream.

When __unmap_hugepage_range() calls to huge_pmd_unshare() succeed, a TLB
flush is missing.  This TLB flush must be performed before releasing the
i_mmap_rwsem, in order to prevent an unshared PMDs page from being
released and reused before the TLB flush took place.

Arguably, a comprehensive solution would use mmu_gather interface to
batch the TLB flushes and the PMDs page release, however it is not an
easy solution: (1) try_to_unmap_one() and try_to_migrate_one() also call
huge_pmd_unshare() and they cannot use the mmu_gather interface; and (2)
deferring the release of the page reference for the PMDs page until
after i_mmap_rwsem is dropeed can confuse huge_pmd_unshare() into
thinking PMDs are shared when they are not.

Fix __unmap_hugepage_range() by adding the missing TLB flush, and
forcing a flush when unshare is successful.

Fixes: 24669e5 ("hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages)" # 3.6
Change-Id: Ic54b06c7c7bfdd5e8d3af8b07bf652ffcc4594b8
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 697a1d44af8ba0477ee729e632f4ade37999249a ]

tlb_remove_huge_tlb_entry only considers PMD_SIZE and PUD_SIZE when
updating the mmu_gather structure.

Unfortunately on arm64 there are two additional huge page sizes that
need to be covered: CONT_PTE_SIZE and CONT_PMD_SIZE. Where an end-user
attempts to employ contiguous huge pages, a VM_BUG_ON can be experienced
due to the fact that the tlb structure hasn't been correctly updated by
the relevant tlb_flush_p.._range() call from tlb_remove_huge_tlb_entry.

This patch adds inequality logic to the generic implementation of
tlb_remove_huge_tlb_entry s.t. CONT_PTE_SIZE and CONT_PMD_SIZE are
effectively covered on arm64. Also, as well as ptes, pmds and puds;
p4ds are now considered too.

Reported-by: David Hildenbrand <david@redhat.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/linux-mm/811c5c8e-b3a2-85d2-049c-717f17c3a03a@redhat.com/
Change-Id: I2afac11e22a9af9c05beef3d3fe2be3419e001bd
Signed-off-by: Steve Capper <steve.capper@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220330112543.863-1-steve.capper@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
lockdep_assert_held() is better suited to checking locking requirements,
since it only checks if the current thread holds the lock regardless of
whether someone else does. This is also a step towards possibly removing
spin_is_locked().

Change-Id: I3580f68064c78aa6c63d3114a92743e72ab2e4a1
Signed-off-by: Lance Roy <ldr709@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <linux-fsdevel@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
…ges_fast()

The local variable `addr' in __get_user_pages_fast() is just a shadow of
`start'.  Since `start' never changes after assignment to `addr', it is
fine to replace `start' with it.

Also the meaning of [start, end] is more obvious than [addr, end] when
passed to gup_pgd_range().

Link: http://lkml.kernel.org/r/20180925021448.20265-1-richard.weiyang@gmail.com
Change-Id: I27ca118e74c4ea6daa9e5ba0a86cf86a87327bc9
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Getting pages from ZONE_DEVICE memory needs to check the backing device's
live-ness, which is tracked in the device's dev_pagemap metadata.  This
metadata is stored in a radix tree and looking it up adds measurable
software overhead.

This patch avoids repeating this relatively costly operation when
dev_pagemap is used by caching the last dev_pagemap while getting user
pages.  The gup_benchmark kernel self test reports this reduces time to
get user pages to as low as 1/3 of the previous time.

Link: http://lkml.kernel.org/r/20181012173040.15669-1-keith.busch@intel.com
Change-Id: Id123705998d401fdd266acafe2f7690e1ef18a85
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reference counters should use refcount_t rather than atomic_t, since the
refcount_t implementation can prevent overflows, reducing the
exploitability of reference leak bugs.  userfaultfd_ctx::refcount is a
reference counter with the usual semantics, so convert it to refcount_t.

Note: I replaced the BUG() on incrementing a 0 refcount with just
refcount_inc(), since part of the semantics of refcount_t is that that
incrementing a 0 refcount is not allowed; with CONFIG_REFCOUNT_FULL,
refcount_inc() already checks for it and warns.

Link: http://lkml.kernel.org/r/20181115003916.63381-1-ebiggers@kernel.org
Change-Id: I3c73bc521c7c22be1ec78f363d4990171cfbbbec
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Android needs to mremap large regions of memory during memory management
related operations.  The mremap system call can be really slow if THP is
not enabled.  The bottleneck is move_page_tables, which is copying each
pte at a time, and can be really slow across a large map.  Turning on
THP may not be a viable option, and is not for us.  This patch speeds up
the performance for non-THP system by copying at the PMD level when
possible.

The speedup is an order of magnitude on x86 (~20x).  On a 1GB mremap,
the mremap completion times drops from 3.4-3.6 milliseconds to 144-160
microseconds.

Before:
Total mremap time for 1GB data: 3521942 nanoseconds.
Total mremap time for 1GB data: 3449229 nanoseconds.
Total mremap time for 1GB data: 3488230 nanoseconds.

After:
Total mremap time for 1GB data: 150279 nanoseconds.
Total mremap time for 1GB data: 144665 nanoseconds.
Total mremap time for 1GB data: 158708 nanoseconds.

If THP is enabled the optimization is mostly skipped except in certain
situations.

[joel@joelfernandes.org: fix 'move_normal_pmd' unused function warning]
  Link: http://lkml.kernel.org/r/20181108224457.GB209347@google.com
Link: http://lkml.kernel.org/r/20181108181201.88826-3-joelaf@google.com
Change-Id: I1290a43e844e0311f697dfea41b97625c318e7ab
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
Signed-off-by: Samuel Pascua <sgpascua@ngcp.ph>
Moving page-tables at the PMD-level on x86 is known to be safe.  Enable
this option so that we can do fast mremap when possible.

Link: http://lkml.kernel.org/r/20181108181201.88826-4-joelaf@google.com
Change-Id: I5c58a56420d990c2cb6f8c041cbcf9258ea230ba
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When using mremap() syscall in addition to MREMAP_FIXED flag, mremap()
calls mremap_to() which does the following:

1) unmaps the destination region where we are going to move the map
2) If the new region is going to be smaller, we unmap the last part
   of the old region

Then, we will eventually call move_vma() to do the actual move.

move_vma() checks whether we are at least 4 maps below max_map_count
before going further, otherwise it bails out with -ENOMEM.  The problem
is that we might have already unmapped the vma's in steps 1) and 2), so
it is not possible for userspace to figure out the state of the vmas
after it gets -ENOMEM, and it gets tricky for userspace to clean up
properly on error path.

While it is true that we can return -ENOMEM for more reasons (e.g: see
may_expand_vm() or move_page_tables()), I think that we can avoid this
scenario if we check early in mremap_to() if the operation has high
chances to succeed map-wise.

Should that not be the case, we can bail out before we even try to unmap
anything, so we make sure the vma's are left untouched in case we are
likely to be short of maps.

The thumb-rule now is to rely on the worst-scenario case we can have.
That is when both vma's (old region and new region) are going to be
split in 3, so we get two more maps to the ones we already hold (one per
each).  If current map count + 2 maps still leads us to 4 maps below the
threshold, we are going to pass the check in move_vma().

Of course, this is not free, as it might generate false positives when
it is true that we are tight map-wise, but the unmap operation can
release several vma's leading us to a good state.

Another approach was also investigated [1], but it may be too much
hassle for what it brings.

[1] https://lore.kernel.org/lkml/20190219155320.tkfkwvqk53tfdojt@d104.suse.de/

Link: http://lkml.kernel.org/r/20190226091314.18446-1-osalvador@suse.de
Change-Id: I5fb255b6c5aad20e4ed281396271584d30fdead5
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
swapin logic can be reused independently without rest of the logic in
shmem_getpage_gfp.  So lets refactor it out as an independent function.

Link: http://lkml.kernel.org/r/20190114153129.4852-1-vpillai@digitalocean.com
Change-Id: Ia566f80c5d96590a52723e5836e94528b5a2e014
Signed-off-by: Vineeth Remanan Pillai <vpillai@digitalocean.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kelley Nielsen <kelleynnn@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Userfaultfd can be misued to make it easier to exploit existing
use-after-free (and similar) bugs that might otherwise only make a
short window or race condition available.  By using userfaultfd to
stall a kernel thread, a malicious program can keep some state that it
wrote, stable for an extended period, which it can then access using an
existing exploit.  While it doesn't cause the exploit itself, and while
it's not the only thing that can stall a kernel thread when accessing a
memory location, it's one of the few that never needs privilege.

We can add a flag, allowing userfaultfd to be restricted, so that in
general it won't be useable by arbitrary user programs, but in
environments that require userfaultfd it can be turned back on.

Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
whether userfaultfd is allowed by unprivileged users.  When this is
set to zero, only privileged users (root user, or users with the
CAP_SYS_PTRACE capability) will be able to use the userfaultfd
syscalls.

Andrea said:

: The only difference between the bpf sysctl and the userfaultfd sysctl
: this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
: requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
: because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
: already if it's doing other kind of tracking on processes runtime, in
: addition of userfaultfd.  In other words both syscalls works only for
: root, when the two sysctl are opt-in set to 1.

[dgilbert@redhat.com: changelog additions]
[akpm@linux-foundation.org: documentation tweak, per Mike]
Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
Change-Id: Iee5ab7cf04ec1b92b01d625f7e2d8a6308ee2fa0
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ Upstream commit 8897c1b1a1795cab23d5ac13e4e23bf0b5f4e0c6 ]

syzbot found the following crash:

  BUG: KASAN: use-after-free in perf_trace_lock_acquire+0x401/0x530 include/trace/events/lock.h:13
  Read of size 8 at addr ffff8880a5cf2c50 by task syz-executor.0/26173

  CPU: 0 PID: 26173 Comm: syz-executor.0 Not tainted 5.3.0-rc6 #146
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
     perf_trace_lock_acquire+0x401/0x530 include/trace/events/lock.h:13
     trace_lock_acquire include/trace/events/lock.h:13 [inline]
     lock_acquire+0x2de/0x410 kernel/locking/lockdep.c:4411
     __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
     _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
     spin_lock include/linux/spinlock.h:338 [inline]
     shmem_fault+0x5ec/0x7b0 mm/shmem.c:2034
     __do_fault+0x111/0x540 mm/memory.c:3083
     do_shared_fault mm/memory.c:3535 [inline]
     do_fault mm/memory.c:3613 [inline]
     handle_pte_fault mm/memory.c:3840 [inline]
     __handle_mm_fault+0x2adf/0x3f20 mm/memory.c:3964
     handle_mm_fault+0x1b5/0x6b0 mm/memory.c:4001
     do_user_addr_fault arch/x86/mm/fault.c:1441 [inline]
     __do_page_fault+0x536/0xdd0 arch/x86/mm/fault.c:1506
     do_page_fault+0x38/0x590 arch/x86/mm/fault.c:1530
     page_fault+0x39/0x40 arch/x86/entry/entry_64.S:1202

It happens if the VMA got unmapped under us while we dropped mmap_sem
and inode got freed.

Pinning the file if we drop mmap_sem fixes the issue.

Link: http://lkml.kernel.org/r/20190927083908.rhifa4mmaxefc24r@box
Change-Id: Id8c184b726f0992ac57fa289fa152018d881bb29
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: syzbot+03ee87124ee05af991bd@syzkaller.appspotmail.com
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Patch series "Control over userfaultfd kernel-fault handling", v6.

This patch series is split from [1].  The other series enables SELinux
support for userfaultfd file descriptors so that its creation and movement
can be controlled.

It has been demonstrated on various occasions that suspending kernel code
execution for an arbitrary amount of time at any access to userspace
memory (copy_from_user()/copy_to_user()/...) can be exploited to change
the intended behavior of the kernel.  For instance, handling page faults
in kernel-mode using userfaultfd has been exploited in [2, 3].  Likewise,
FUSE, which is similar to userfaultfd in this respect, has been exploited
in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object.  It then adds a 'user-mode only' option to the
unprivileged_userfaultfd sysctl knob to require unprivileged callers to
use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to enhance
security vulnerabilities by lengthening the race window in kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

This patch (of 2):

userfaultfd handles page faults from both user and kernel code.  Add a new
UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting
userfaultfd object refuse to handle faults from kernel mode, treating
these faults as if SIGBUS were always raised, causing the kernel code to
fail with EFAULT.

A future patch adds a knob allowing administrators to give some processes
the ability to create userfaultfd file objects only if they pass
UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will
exploit userfaultfd's ability to delay kernel page faults to open timing
windows for future exploits.

Link: https://lkml.kernel.org/r/20201120030411.2690816-1-lokeshgidra@google.com
Link: https://lkml.kernel.org/r/20201120030411.2690816-2-lokeshgidra@google.com
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <calin@google.com>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 37cd0575b8510159992d279c530c05f872990b02)

Conflicts:
    include/uapi/linux/userfaultfd.h
(1. Removed uffdio_writeprotect struct part of 63b2d4174c4ad)

Bug: 160737021
Bug: 169683130
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Change-Id: Ic4cecce08956402f91f866ae08f1d4ec21d64289
Haijian Ma and others added 26 commits September 26, 2025 18:05
Rerun APSD to ensure proper charger detection if device
boots with charger connected.

Change-Id: I21c9be0c098b7e7ba388f71e9ba4b90180bfd112
Signed-off-by: Haijian Ma <mahj8@motorola.com>
Reviewed-on: https://gerrit.mot.com/1366956
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Huosheng Liao <liaohs@motorola.com>
Submit-Approved: Jira Key
Since we're either using bq2597x or ln8000, get_prop will always throw a
POWER_SUPPLY_HEALTH_UNKNOWN and then power_supply_show_property log it
as failed to report.
Instead of reporting unknown health and spamming log buffer, tell
userspace that there's no data for it.

Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
None of those functions run in interrupt context.

Signed-off-by: Kazuki Hashimoto <kazukih@tuta.io>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Added ufs_ice node configuration for the kona target

Change-Id: I77829e5223b70274f9dbac086b6d5ceaf13259ac
If a task yields, the scheduler may decide to pick it again. The task in
turn may decide to yield immediately or shortly after, leading to a tight
loop of yields.

If there's another runnable task as this point, the deadline will be
increased by the slice at each loop. This can cause the deadline to runaway
pretty quickly, and subsequent elevated run delays later on as the task
doesn't get picked again. The reason the scheduler can pick the same task
again and again despite its deadline increasing is because it may be the
only eligible task at that point.

Fix this by making the task forfeiting its remaining vruntime and pushing
the deadline one slice ahead. This implements yield behavior more
authentically.

We limit the forfeiting to eligible tasks. This is because core scheduling
prefers running ineligible tasks rather than force idling. As such, without
the condition, we can end up on a yield loop which makes the vruntime
increase rapidly, leading to anomalous run delays later down the line.

Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling  policy")
Link: https://lore.kernel.org/r/20250401123622.584018-1-sieberf@amazon.com
Link: https://lore.kernel.org/r/20250911095113.203439-1-sieberf@amazon.com
Signed-off-by: Fernand Sieber <sieberf@amazon.com>

Changes in v2:
- Implement vruntime forfeiting approach suggested by Peter Zijlstra
- Updated commit name
- Previous Reviewed-by tags removed due to algorithm change

Changes in v3:
- Only increase vruntime for eligible tasks to avoid runaway vruntime with
  core scheduling

Link: https://lore.kernel.org/r/20250916140228.452231-1-sieberf@amazon.com
Change-Id: I7fdb4e963894b399a1269b543615f7b2649f69ae
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
* Reapply new patch from upstream kernel.
This reverts commit 908a467.

Change-Id: I2030f66360dcf5b67d067339945ca2ea5ffbd3f6
Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>
This can avoid confusing tracepoint values.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit fe59109ae5c0b34a8c7c07f693fc501b12b57787)
Change-Id: I64d127fed283cfe93579a031b1b8762d35069f98
Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>
Let's explicitly use the defined values in block_age case only.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit ed2724765e58e3310d3de48f4a1761631b3dd640)
Change-Id: Ibbed9907e98c54a7fc807f5deb9300aeb2d35e70
Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>
Otherwise, __lookup_extent_tree() will override the given extent_info which will
be used by caller.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 22a341b43036415718f2d50f5f98b2f891fe17e9)
Change-Id: I5ddc08dd3d299f222b4b9b1334f2e675e37956b6
Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>
Currently we wrongly calculate the new block age to
old * LAST_AGE_WEIGHT / 100.

Fix it to new * (100 - LAST_AGE_WEIGHT) / 100
                + old * LAST_AGE_WEIGHT / 100.

Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com>
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit b03a41a495df35f8e8d25220878bd6b8472d9396)
Change-Id: I989877caf2517245750e7ed809b2af71f8943f96
Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>
Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com>
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit d23be468eada21c828058e0e8d60409eaec373ab)
Change-Id: I4c0a5137cbb6c70dc8ebfe6e1dc7377ae20898fa
Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>
nr_free may be less than len, we should update age extent cache
w/ range [fofs, len] rather than [fofs, nr_free].

Fixes: dfcfcc6 ("f2fs: add block_age-based extent cache")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 8c0ed062ce27f6b7f0a568cb241e2b4dd2d9e6a6)
Change-Id: I0a4546f84fd75db2f13857e4922520ab5885e845
Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>
We should update age extent in f2fs_do_zero_range() like we
did in f2fs_truncate_data_blocks_range().

Fixes: dfcfcc6 ("f2fs: add block_age-based extent cache")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit a84153f939808102dfa10904aa0f743e734a3e1d)
Change-Id: Iab59f94f792dbab8da7ab597d5a219476370a646
Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>
The checkpoint is the top priority thread which can stop all the filesystem
operations. Let's make it RT priority.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 8a2d9f00d502e6ef68c6d52f0863856040ddd2db)
Change-Id: I7ab430a7294fa9644512f0a32245646944446213
Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>
Move thread priority call to kernel worker thread because
component bind API may run from vendor_modeprobe process
context when all drivers probe succeed. Thread priority
update is not allowed from vendor_modeprobe process
context.

Change-Id: Iafac97ce02942d6a2134495232f3c395ba4a362f
Signed-off-by: Dhaval Patel <pdhaval@codeaurora.org>
Following is the sequence during which issue is observed:

1) After suspend, resume commit is with doze suspend mode
(LP2) but kickoff didn't begin.
2) At the same time, runtime pm suspend is triggered which
makes idle request on pm suspend/resume thread.
3) Since the kickoff has not yet started on commit thread,
pending kickoff count is not updated and idle power
collapse sequence is started from pm suspend.
4) As part of idle, irqs are turned off from pm suspend
thread which are turned on commit thread after kickoff
before pp tx irq arrives.

In such cases, during pm suspend, commit thread workqueue
is flushed before encoder idle request to prevent irq's
from getting turned off before the transfer is complete to
avoid inconsistent state.

Change-Id: I417ece0ae7021b0fc5005e262a0d87e43ac729be
Signed-off-by: Yashwanth <yvulapu@codeaurora.org>
Reinit thread priority work before queueing on multiple display
threads as the work stores the former worker thread. Also
flush work such the next init is serialized.

Change-Id: I51409d4d12d100be0cb30238f812a56ec064a339
Signed-off-by: Kalyan Thota <quic_kalyant@quicinc.com>
Currently aggresive idle-pc entry is only enabled in
case of doze-suspend mode. Extend the support to doze
mode as well.

Change-Id: I8e9e0e116bb65a1aec0180bf9bc10bed99d4a137
Signed-off-by: Govinda Rao K S <quic_gkarikur@quicinc.com>
This change updates the time required to enter idle_pc based
on frame rate instead of default time. In the current issue,
customer is facing janks where frame rate is 30fps and race
happens between sde_encoder_off_work and drm_atomic_commit
scheduled from userspace. It also sets max and min bound for
optimized performance.

Change-Id: I5e95e920a2f7b2142b5f63e8ce6b82cf1d482db1
Signed-off-by: Yojana Juadi <quic_yjuadi@quicinc.com>
…ncoder wakeup

During PM suspend in dual display usecase, the power off commit to
turn off primary and secondary crtcs is done with only one
drm_atomic_state scheduled on primary crtc_commit thread. At the
same, touch events can happen on secondary panel, which will
run input_event_work and schedule the sde_enc->delayed_off_work
to turn off its enabled resources. There can be race between primary
crtc_commit thread which unregisters input_event, cancels
all the pending works before setting sde_enc->cur_master to NULL
and input_event_work_handler which schedules the delayed_off_work
without checking the input_event_handler state.
This change adds input handler unregister check before triggering
_sde_encoder_rc_early_wakeup.

Change-Id: I553843f81078810784f18e92347f918ab6e4a9fe
Signed-off-by: Jayaprakash Madisetty <quic_jmadiset@quicinc.com>
…event thread

Move frame data stats collection/notification during frame-done and
retire fence sysfs notification to event thread. This will free up
some interrupt time.

Change-Id: I2648ac4287ce8712e9a059edd408a59753aa6d32
Signed-off-by: Veera Sundaram Sankaran <quic_veeras@quicinc.com>
Signed-off-by: V S Ganga VaraPrasad (VARA) Adabala <quic_vadabala@quicinc.com>
[YumeMichi: Port to k4.19. Check torvalds/linux@dd8088d]
Signed-off-by: Yuan Si <do4suki@gmail.com>
They are insanely annoying and bloats dmesg
@nischay12089

Copy link
Copy Markdown
Author

Avc shii to enable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.