Harden trace_pid authorization and reduce per-event overhead#133
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens trace_pid authorization in the userspace daemon and reduces per-event overhead across the eBPF → daemon → client pipeline, aligning with Pinchy’s goals of safe syscall tracing with low overhead.
Changes:
- Tighten
trace_pidauthorization: validate syscall filter indices, detect pidfd exit correctly, add per-UID registration caps, and detach clients on post-exec UID transitions. - Reduce event overhead: switch ringbuf submissions to adaptive notifications, avoid flush starvation in the writer task, and read syscall numbers directly from the sys_exit tracepoint record.
- Reduce PID-reuse exposure by removing exited PIDs from the eBPF PID filter immediately while keeping clients for grace-period dispatch.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
pinchy/src/tracing.rs |
Writer task flushing behavior, immediate PID filter removal on exit, and exec-based reauthorization revalidation logic. |
pinchy/src/server.rs |
Harden trace_pid input validation and replace pidfd TOCTOU check with exit detection via polling. |
pinchy/src/client.rs |
Lower-latency output by increasing read buffer capacity and flushing when caught up. |
pinchy-ebpf/src/util.rs |
Reduce eBPF-side per-event overhead by avoiding force-wakeup and removing ENTER_MAP lookup for syscall number. |
Comment on lines
+47
to
+52
| let ret = unsafe { libc::poll(&mut poll_fd, 1, 0) }; | ||
| if ret < 0 { | ||
| return Err(io::Error::last_os_error()); | ||
| } | ||
|
|
||
| let stat = unsafe { stat.assume_init() }; | ||
|
|
||
| Ok(stat.st_uid) | ||
| Ok(poll_fd.revents & libc::POLLIN != 0) |
Owner
Author
There was a problem hiding this comment.
Fixed and squashed into the hardening commit.
Comment on lines
+937
to
+944
| // If the process is already gone the pidfd monitoring handles | ||
| // cleanup; nothing to decide here. | ||
| let Ok(meta) = std::fs::metadata(format!("/proc/{pid}")) else { | ||
| return Ok(()); | ||
| }; | ||
|
|
||
| let current_uid = std::os::unix::fs::MetadataExt::uid(&meta); | ||
|
|
Owner
Author
There was a problem hiding this comment.
Fixed and squashed into the hardening commit.
Comment on lines
+705
to
+714
| for pid in revalidate_pids.drain(..) { | ||
| if let Err(e) = event_dispatch | ||
| .write() | ||
| .await | ||
| .revalidate_pid_authorization(pid) | ||
| .await | ||
| { | ||
| eprintln!("Error revalidating PID {pid}: {e}"); | ||
| } | ||
| } |
Owner
Author
There was a problem hiding this comment.
Fixed and squashed into the hardening commit.
- Validate caller-supplied syscall numbers against the 512-bit SYSCALL_FILTER range; out-of-range values used to cause an out-of-bounds bitmap index (panic) in resubscribe_syscalls - Fix the pidfd TOCTOU check: fcntl(F_GETFD) on our own descriptor always succeeds and detected nothing. Poll the pidfd after reading the UID instead - if the process is still alive the PID cannot have been recycled, so the UID read from /proc belongs to the process the pidfd refers to. Report ESRCH as a distinct "no such PID" error instead of conflating it with authorization failure - Remove the PID from the eBPF filter as soon as its pidfd reports exit, instead of leaving it through the grace period; a recycled PID could be traced and its events streamed to the old client. Clients stay registered for the grace period so buffered events still drain - Detach tracing on privilege transitions: always capture execve/execveat while tracing, and when one succeeds re-check /proc/<pid> ownership (which follows effective credentials and becomes root for non-dumpable setuid processes) against the authorizing UID, dropping clients that no longer match. Previously a traced process exec'ing a setuid binary kept streaming privileged data - including read/write buffer contents - to the unprivileged tracer - Cap concurrent registrations per UID (64): each one pins a pipe, writer task and queue in the root daemon Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Submit ring buffer entries with flag 0 instead of BPF_RB_FORCE_WAKEUP: the kernel's adaptive notification only wakes the consumer when it has caught up, removing an irq_work + epoll wakeup per event under load - Read the syscall number from the sys_exit tracepoint record instead of an ENTER_MAP hashmap lookup, dropping one map operation per traced event in every consolidated handler - Flush the client writer whenever the event queue is drained: the flush previously lived only in a select! sleep arm that was recreated (and thus reset) by every incoming event, so a steady trickle of syscalls kept output buffered indefinitely (up to 256 KiB). Same fix on the client side, flushing stdout when the pipe read buffer is empty, and use a 64 KiB read buffer to cut pipe read syscalls - Writer task write errors now break out of the loop instead of returning, so the cleanup notification fires and the client is removed instead of lingering until process exit Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
3a4f793 to
8b0c435
Compare
This was referenced Jun 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Second PR from the review pass (after #132), covering the security and performance findings.
Security —
trace_pidauthorization (daemon):fcntl F_GETFDsucceeds for any valid fd of ours, it says nothing about the process). It now polls the pidfd forPOLLINto detect that the target exited between the/procownership check and registration. The unuseduid_from_pidfd()was removed —fstaton a pidfd returns the anon-inode's owner (the daemon), not the target process, so it could not serve this purpose.execve/execveatthe daemon re-checks/proc/<pid>ownership and detaches clients whose authorizing UID no longer matches (setuid transition); execve exits are always captured while anything is traced so the dispatch loop can observe them. Root-authorized clients stay attached.ESRCHduring validation now maps toUnknownObject("no process with PID...") instead ofAuthFailed.Performance:
BPF_RB_FORCE_WAKEUP, avoiding an irq_work + epoll wakeup per event.get_syscall_nr()reads the syscall number straight from the sys_exit tracepoint record (offset 8) instead of an ENTER_MAP lookup.🤖 Generated with Claude Code