feat(profiling): introduce an off-cpu-time approximation profiler#18623
feat(profiling): introduce an off-cpu-time approximation profiler#18623vlad-scherbich wants to merge 18 commits into
Conversation
|
@codex please review |
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 17c5fa8b9b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR adds an “off-CPU time” metric to profiling stack samples by approximating it as wall_time - cpu_time, and introduces tests to validate that sleeping workloads accumulate more off-CPU time than CPU-bound workloads.
Changes:
- Compute and emit an off-CPU sample value at stack flush time (
off_cpu_ns = max(wall_ns - cpu_ns, 0)). - Plumb a new OffCPU sample type/value through the dd-wrapper + Cython ddup bindings.
- Add pytest coverage validating the new off-CPU sample type appears in generated pprof output and behaves as expected.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/profiling/collector/test_stack.py | Adds new off-CPU pprof assertions and comparison test for sleeping vs CPU-bound threads. |
| ddtrace/internal/datadog/profiling/stack/src/stack_renderer.cpp | Computes off-CPU approximation and pushes it into each stack sample. |
| ddtrace/internal/datadog/profiling/ddup/_ddup.pyx | Exposes SampleHandle.push_offcputime() in the Cython bindings. |
| ddtrace/internal/datadog/profiling/dd_wrapper/src/sample.cpp | Implements Datadog::Sample::push_offcputime(). |
| ddtrace/internal/datadog/profiling/dd_wrapper/src/profile.cpp | Registers off-CPU sample types in the pprof profile when enabled. |
| ddtrace/internal/datadog/profiling/dd_wrapper/src/ddup_interface.cpp | Exposes a C API wrapper ddup_push_offcputime(). |
| ddtrace/internal/datadog/profiling/dd_wrapper/include/types.hpp | Adds SampleType::OffCPU and value indices for off-CPU time/count. |
| ddtrace/internal/datadog/profiling/dd_wrapper/include/sample.hpp | Declares push_offcputime() on Datadog::Sample. |
| ddtrace/internal/datadog/profiling/dd_wrapper/include/ddup_interface.hpp | Declares ddup_push_offcputime() in the C API surface. |
| ddtrace/internal/datadog/profiling/cmake/FindLibNative.cmake | Adjusts imported target properties (adds IMPORTED_NO_SONAME). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codeowners resolved as |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…e flag (disabled by default)
… fix linker error
6a9212d to
a5adb8c
Compare
BenchmarksBenchmark execution time: 2026-06-18 17:24:08 Comparing candidate commit 3a3c2ef in PR branch Found 0 performance improvements and 6 performance regressions! Performance is the same for 611 metrics, 10 unstable metrics. scenario:iast_aspects-re_expand_aspect
scenario:iastaspects-index_aspect
scenario:iastaspects-ljust_aspect
scenario:iastaspects-title_aspect
scenario:iastaspectsospath-ospathbasename_aspect
scenario:span-start
|
… ordering ProfilerState::start() uses std::call_once, so the profile is initialized with whatever type_mask is current on the first ddup.start() call in the process. Earlier tests in the same pytest session call ddup.start() without offcpu_time_enabled, preventing the OffCPU sample type from being registered when the off-cpu tests run later.
…s unavailable Suspended tasks (asyncio/greenlet not scheduled) emit off_cpu = wall_time, which is exact and requires no CPU measurement. The previous has_cpu_time gate was dropping these samples on platforms where CPU time is unavailable, even though no cpu_time value is needed for the suspended-task case.
note: blocked on DataDog/libdatadog#2135 landing first
Next PR >>
Summary
Adds off-CPU time as a new sample type to the stack profiler (stack v2). Each stack sample now includes an
off-cpu-timevalue approximating the time a thread spent waiting — on I/O, locks, sleeps, or OS scheduling — during the sampling interval.Motivation
We currently do not have a clear signal on the time threads spend waiting on off-cpu events: I/O, locks, sleep, OS scheduling. For I/O-bound or lock-contended services this can constitute the majority of wall time.
Why approximate rather than trace
True off-CPU profiling requires attaching to the
sched_switchperf event with eBPF, which requires kernel privileges. This is unavailable by default in a typical client setup, and might present a challenge in obtaining the right perms. Computingmax(0, wall_time − cpu_time)is an approximation that requires no additional kernel access.How it works
When a thread's wall time significantly exceeds its CPU time, the difference surfaces as off-CPU, making blocking behavior visible.
For each sample:
off_cpu = max(0, wall_time - cpu_time)on_cpu=false):off_cpu = wall_time(no CPU measurement needed — the task was fully off-CPU)The sample type is emitted as
off-cpu-timeusingDDOG_PROF_SAMPLE_TYPE_OFF_CPU_TIMEfrom libdatadog. Enabled viaDD_PROFILING_STACK_V2_OFFCPU_TIME_ENABLED=true(disabled by default).Expected Overhead
Negligible
At the default 10ms sampling interval, 100 samples/sec per thread:
Testing