Skip to content

barrier: Add smp_cond_load_{relaxed,acquire}_timeout()#10991

Closed
kernel-patches-daemon-bpf[bot] wants to merge 13 commits intobpf-next_basefrom
series/1052077=>bpf-next
Closed

barrier: Add smp_cond_load_{relaxed,acquire}_timeout()#10991
kernel-patches-daemon-bpf[bot] wants to merge 13 commits intobpf-next_basefrom
series/1052077=>bpf-next

Conversation

@kernel-patches-daemon-bpf
Copy link
Copy Markdown

Pull request for series with
subject: barrier: Add smp_cond_load_{relaxed,acquire}_timeout()
version: 9
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1052077

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: db975de
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1052077
version: 9

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: db975de
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1052077
version: 9

@kernel-patches-review-bot
Copy link
Copy Markdown

commit ae21cf126d64f3741786b53ba3f800d44d0ec50f
Author: Ankur Arora <ankur.a.arora@oracle.com>

sched: add need-resched timed wait interface

This commit adds tif_bitset_relaxed_wait() and tif_need_resched_relaxed_wait()
which provide a timed wait interface for thread_info bits using
smp_cond_load_relaxed_timeout().

> The wait is implemented via smp_cond_load_relaxed_timeout().
>
> smp_cond_load_acquire_timeout() essentially provides the pattern used
> in poll_idle() where we spin in a loop waiting for the flag to change
> until a timeout occurs.

This isn't a bug, but the commit message mentions smp_cond_load_acquire_timeout()
when describing the poll_idle() pattern.

However, the code uses smp_cond_load_relaxed_timeout(), and the subsequent
poll_idle() conversion (patch 12/12) also uses the relaxed variant via
tif_need_resched_relaxed_wait().

Should this say smp_cond_load_relaxed_timeout() instead of
smp_cond_load_acquire_timeout() for consistency with the implementation?



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: sched: add need-resched timed wait interface
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/21810635323

AI-authorship-score: low
AI-authorship-explanation: The commit follows standard kernel style with iterative improvements across 9 versions and reviewer feedback, consistent with human authorship.
issues-found: 1
issue-severity-score: low
issue-severity-explanation: Minor commit message inaccuracy referencing wrong function name, no impact on code correctness or runtime behavior.

@kernel-patches-review-bot
Copy link
Copy Markdown

commit 0e491f6cafdb72c3d74148070a55836f1deb64f3
Author: Ankur Arora <ankur.a.arora@oracle.com>

bpf/rqspinlock: switch check_timeout() to a clock interface

This commit renames check_timeout() to clock_deadlock() and changes its
return type from int to s64. On success it now returns the current
monotonic time instead of 0, while still returning negative errno on
failure.

> diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
> index 2fdfa828e3d3..f02b1c5f0222 100644
> --- a/kernel/bpf/rqspinlock.c
> +++ b/kernel/bpf/rqspinlock.c

[ ... ]

> @@ -340,6 +349,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
> {
> 	struct mcs_spinlock *prev, *next, *node;
> 	struct rqspinlock_timeout ts;
> +	s64 timeout_err = 0;
> 	int idx, ret = 0;
> 	u32 old, tail;

[ ... ]

In arch_mcs_spin_lock_contended(), when a waiter in the MCS queue receives
a timeout signal (RES_TIMEOUT_VAL) from the previous waiter:

> 	if (old & _Q_TAIL_MASK) {
> 		int val;
>
> 		prev = decode_tail(old, rqnodes);
>
> 		/* Link @node into the waitqueue. */
> 		WRITE_ONCE(prev->next, node);
>
> 		val = arch_mcs_spin_lock_contended(&node->locked);
> 		if (val == RES_TIMEOUT_VAL) {
> 			ret = -ETIMEDOUT;
> 			goto waitq_timeout;
> 		}

This sets ret = -ETIMEDOUT but does not set timeout_err. When execution
reaches the waitq_timeout label:

> waitq_timeout:
> -	if (ret) {
> +	if (timeout_err < 0) {

The condition evaluates to false since timeout_err is still 0. This causes
the timeout handling to be skipped entirely, and execution falls through
to the lock acquisition path. If the lock is acquired, the function
returns ret (which is -ETIMEDOUT), making the caller believe the lock was
not acquired when it actually was. This appears to be a lock leak.

Should the goto waitq_timeout path also set timeout_err = -ETIMEDOUT?



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf/rqspinlock: switch check_timeout() to a clock interface
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/21810635323

AI-authorship-score: low
AI-authorship-explanation: The commit shows typical human developer patterns including a missed edge case during refactoring, which is a common human error rather than systematic AI-generated code issues.
issues-found: 1
issue-severity-score: high
issue-severity-explanation: Lock leak bug where the function may acquire a lock but return an error code, causing callers to think acquisition failed when it actually succeeded.

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: db975de
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1052077
version: 9

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: db975de
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1052077
version: 9

Add smp_cond_load_relaxed_timeout(), which extends
smp_cond_load_relaxed() to allow waiting for a duration.

We loop around waiting for the condition variable to change while
peridically doing a time-check. The loop uses cpu_poll_relax() to slow
down the busy-waiting, which, unless overridden by the architecture
code, amounts to a cpu_relax().

Note that there are two ways for the time-check to fail: the usual
timeout case or, @time_expr_ns returning an invalid value (negative
or zero). The second failure mode allows for clocks attached to the
clock-domain of @cond_expr, which might cease to operate meaningfully
once some state internal to @cond_expr has changed.

Evaluation of @time_expr_ns: in the fastpath we want to keep the
performance close to smp_cond_load_relaxed(). To do that we defer
evaluation of the potentially costly @time_expr_ns to when we hit
the slowpath.

This also means that there will always be some hardware dependent
duration that has passed in cpu_poll_relax() iterations at the time of
first evaluation. Additionally cpu_poll_relax() is not guaranteed to
return at timeout boundary. In sum, expect timeout overshoot when we
exit due to expiration of the timeout.

The number of spin iterations before time-check, SMP_TIMEOUT_POLL_COUNT
is chosen to be 200 by default. With a cpu_poll_relax() iteration
taking ~20-30 cycles (measured on a variety of x86 platforms), we expect
a tim-check every ~4000-6000 cycles.

The outer limit of the overshoot is double that when working with the
parameters above. This might be higher or lower depending on the
implementation of cpu_poll_relax() across architectures.

Lastly, config option ARCH_HAS_CPU_RELAX indicates availability of a
cpu_poll_relax() that is cheaper than polling. This might be relevant
for cases with a prolonged timeout.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Support waiting in smp_cond_load_relaxed_timeout() via
__cmpwait_relaxed(). To ensure that we wake from waiting in
WFE periodically and don't block forever if there are no stores
to ptr, this path is only used when the event-stream is enabled.

Note that when using __cmpwait_relaxed() we ignore the timeout
value, allowing an overshoot by upto the event-stream period.
And, in the unlikely event that the event-stream is unavailable,
fallback to spin-waiting.

Also set SMP_TIMEOUT_POLL_COUNT to 1 so we do the time-check in
each iteration of smp_cond_load_relaxed_timeout().

And finally define ARCH_HAS_CPU_RELAX to indicate that we have
an optimized implementation of cpu_poll_relax().

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Suggested-by: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>

Note:
   This commit additionally defines ARCH_HAS_CPU_RELAX.

   Will: I've retained your acked-by. Please let me know if you
   don't agree with this change.
Moves some constants and functions related to xloops, cycles computation
out to a new header. Also rename some macros in qcom/rpmh-rsc.c which
were occupying the same namespace.

No functional change.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Konrad Dybcio <konradybcio@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
To handle WFET use __cmpwait_timeout() similarly to __cmpwait(). These
call out to the respective __cmpwait_case_timeout_##sz(),
__cmpwait_case_##sz() functions.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
…ait()

In preparation for defining smp_cond_load_acquire_timeout(), remove
the private copy. Lacking this, the rqspinlock code falls back to using
smp_cond_load_acquire().

Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Haris Okanovic <harisokn@amazon.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Add the acquire variant of smp_cond_load_relaxed_timeout(). This
reuses the relaxed variant, with additional LOAD->LOAD ordering.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Haris Okanovic <harisokn@amazon.com>
Tested-by: Haris Okanovic <harisokn@amazon.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Add atomic load wrappers, atomic_cond_read_*_timeout() and
atomic64_cond_read_*_timeout() for the cond-load timeout interfaces.

Also add a short description for the atomic_cond_read_{relaxed,acquire}(),
and the atomic_cond_read_{relaxed,acquire}_timeout() interfaces.

Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Add the atomic long wrappers for the cond-load timeout interfaces.

Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
check_timeout() gets the current time value and depending on how
much time has passed, checks for deadlock or times out, returning 0
or -errno on deadlock or timeout.

Switch this out to a clock style interface, where it functions as a
clock in the "lock-domain", returning the current time until a
deadlock or timeout occurs. Once a deadlock or timeout has occurred,
it stops functioning as a clock and returns error.

Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: bpf@vger.kernel.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Switch out the conditional load interfaces used by rqspinlock
to smp_cond_read_acquire_timeout() and its wrapper,
atomic_cond_read_acquire_timeout().

Both these handle the timeout and amortize as needed, so use
clock_deadlock() directly instead of going through RES_CHECK_TIMEOUT().

For correctness, however, we need to ensure that the timeout case in
smp_cond_read_acquire_timeout() always agrees with that in
clock_deadlock(), which returns with -ETIMEDOUT.

For the most part, this is fine because smp_cond_load_acquire_timeout()
does not have an independent clock and does not do double reads from
clock_deadlock() which could cause its internal state to go out of
sync from that of clock_deadlock().

There is, however, an edge case where clock_deadlock() checks for:

        if (time > ts->timeout_end)
                return -ETIMEDOUT;

while smp_cond_load_acquire_timeout() checks for:

        __time_now = (time_expr_ns);
        if (__time_now <= 0 || __time_now >= __time_end) {
                VAL = READ_ONCE(*__PTR);
                break;
        }

This runs into a problem when (__time_now == __time_end) since
clock_deadlock() does not treat it as a timeout condition but
the second clause in the conditional above does.
So, add an equality check in clock_deadlock().

Finally, redefine SMP_TIMEOUT_POLL_COUNT to be 16k to be similar to the
spin-count used in RES_CHECK_TIMEOUT(). We only do this for non-arm64
as that uses a waiting implementation.

Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: bpf@vger.kernel.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Add tif_bitset_relaxed_wait() (and tif_need_resched_relaxed_wait()
which wraps it) which takes the thread_info bit and timeout duration
as parameters and waits until the bit is set or for the expiration
of the timeout.

The wait is implemented via smp_cond_load_relaxed_timeout().

smp_cond_load_acquire_timeout() essentially provides the pattern used
in poll_idle() where we spin in a loop waiting for the flag to change
until a timeout occurs.

tif_need_resched_relaxed_wait() allows us to abstract out the internals
of waiting, scheduler specific details etc.

Placed in linux/sched/idle.h instead of linux/thread_info.h to work
around recursive include hell.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
…d_wait()

The inner loop in poll_idle() polls over the thread_info flags,
waiting to see if the thread has TIF_NEED_RESCHED set. The loop
exits once the condition is met, or if the poll time limit has
been exceeded.

To minimize the number of instructions executed in each iteration,
the time check is rate-limited. In addition, each loop iteration
executes cpu_relax() which on certain platforms provides a hint to
the pipeline that the loop busy-waits, allowing the processor to
reduce power consumption.

Switch over to tif_need_resched_relaxed_wait() instead, since that
provides exactly that.

However, since we want to minimize power consumption in idle, building
of cpuidle/poll_state.c continues to depend on CONFIG_ARCH_HAS_CPU_RELAX
as that serves as an indicator that the platform supports an optimized
version of tif_need_resched_relaxed_wait() (via
smp_cond_load_acquire_timeout()).

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-pm@vger.kernel.org
Suggested-by: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: a9aabb3
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1052077
version: 9

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=1052077 irrelevant now. Closing PR.

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf Bot deleted the series/1052077=>bpf-next branch February 11, 2026 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant