Skip to content

feat(profiling): introduce an off-cpu-time approximation profiler#18623

Draft
vlad-scherbich wants to merge 18 commits into
mainfrom
vlad/profiling-offcpu-approximation
Draft

feat(profiling): introduce an off-cpu-time approximation profiler#18623
vlad-scherbich wants to merge 18 commits into
mainfrom
vlad/profiling-offcpu-approximation

Conversation

@vlad-scherbich

@vlad-scherbich vlad-scherbich commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

note: blocked on DataDog/libdatadog#2135 landing first

Next PR >>

Summary

Adds off-CPU time as a new sample type to the stack profiler (stack v2). Each stack sample now includes an off-cpu-time value approximating the time a thread spent waiting — on I/O, locks, sleeps, or OS scheduling — during the sampling interval.

Motivation

We currently do not have a clear signal on the time threads spend waiting on off-cpu events: I/O, locks, sleep, OS scheduling. For I/O-bound or lock-contended services this can constitute the majority of wall time.

Why approximate rather than trace

True off-CPU profiling requires attaching to the sched_switch perf event with eBPF, which requires kernel privileges. This is unavailable by default in a typical client setup, and might present a challenge in obtaining the right perms. Computing max(0, wall_time − cpu_time) is an approximation that requires no additional kernel access.

How it works

When a thread's wall time significantly exceeds its CPU time, the difference surfaces as off-CPU, making blocking behavior visible.

For each sample:

  • Threads on-CPU (or running async tasks): off_cpu = max(0, wall_time - cpu_time)
  • Suspended async tasks / greenlets (on_cpu=false): off_cpu = wall_time (no CPU measurement needed — the task was fully off-CPU)

The sample type is emitted as off-cpu-time using DDOG_PROF_SAMPLE_TYPE_OFF_CPU_TIME from libdatadog. Enabled via DD_PROFILING_STACK_V2_OFFCPU_TIME_ENABLED=true (disabled by default).

Expected Overhead

Negligible

At the default 10ms sampling interval, 100 samples/sec per thread:

  • Disabled: 500–800 ns/sec = 0.00005% overhead
  • Enabled: ~1–1.5 µs/sec = ~0.0001% overhead

Testing

@vlad-scherbich vlad-scherbich requested review from a team, KowalskiThomas and Copilot June 15, 2026 20:55
@vlad-scherbich

Copy link
Copy Markdown
Contributor Author

@codex please review

@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 9 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741238-d2b8243-manylinux2014_x86_64, 1]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741491-d2b8243-musllinux_1_2_x86_64, 1]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-py | build linux serverless: [arm64, cp315-cp315, v113741357-d2b8243-manylinux2014_aarch64, 1]   View in Datadog   GitLab

View all 9 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 3a3c2ef | Docs | Datadog PR Page | Give us feedback!

@vlad-scherbich vlad-scherbich changed the title Vlad/profiling offcpu approximation feature(profiling): introduce an offcpu-approximation profiler Jun 15, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17c5fa8b9b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ddtrace/internal/datadog/profiling/stack/src/stack_renderer.cpp Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an “off-CPU time” metric to profiling stack samples by approximating it as wall_time - cpu_time, and introduces tests to validate that sleeping workloads accumulate more off-CPU time than CPU-bound workloads.

Changes:

  • Compute and emit an off-CPU sample value at stack flush time (off_cpu_ns = max(wall_ns - cpu_ns, 0)).
  • Plumb a new OffCPU sample type/value through the dd-wrapper + Cython ddup bindings.
  • Add pytest coverage validating the new off-CPU sample type appears in generated pprof output and behaves as expected.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/profiling/collector/test_stack.py Adds new off-CPU pprof assertions and comparison test for sleeping vs CPU-bound threads.
ddtrace/internal/datadog/profiling/stack/src/stack_renderer.cpp Computes off-CPU approximation and pushes it into each stack sample.
ddtrace/internal/datadog/profiling/ddup/_ddup.pyx Exposes SampleHandle.push_offcputime() in the Cython bindings.
ddtrace/internal/datadog/profiling/dd_wrapper/src/sample.cpp Implements Datadog::Sample::push_offcputime().
ddtrace/internal/datadog/profiling/dd_wrapper/src/profile.cpp Registers off-CPU sample types in the pprof profile when enabled.
ddtrace/internal/datadog/profiling/dd_wrapper/src/ddup_interface.cpp Exposes a C API wrapper ddup_push_offcputime().
ddtrace/internal/datadog/profiling/dd_wrapper/include/types.hpp Adds SampleType::OffCPU and value indices for off-CPU time/count.
ddtrace/internal/datadog/profiling/dd_wrapper/include/sample.hpp Declares push_offcputime() on Datadog::Sample.
ddtrace/internal/datadog/profiling/dd_wrapper/include/ddup_interface.hpp Declares ddup_push_offcputime() in the C API surface.
ddtrace/internal/datadog/profiling/cmake/FindLibNative.cmake Adjusts imported target properties (adds IMPORTED_NO_SONAME).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ddtrace/internal/datadog/profiling/stack/src/stack_renderer.cpp Outdated
Comment thread tests/profiling/collector/test_stack.py
Comment thread ddtrace/internal/datadog/profiling/ddup/_ddup.pyx
@cit-pr-commenter-54b7da

cit-pr-commenter-54b7da Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codeowners resolved as

ddtrace/internal/datadog/profiling/stack/src/stack_renderer.cpp         @DataDog/profiling-python

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Comment thread ddtrace/internal/datadog/profiling/stack/src/stack_renderer.cpp
Comment thread releasenotes/notes/profiling-off-cpu-time-approximation-776e49c0465b9ed5.yaml Outdated
Comment thread tests/profiling/collector/test_stack.py Outdated
@vlad-scherbich vlad-scherbich requested a review from Copilot June 16, 2026 22:03
@vlad-scherbich vlad-scherbich changed the title feature(profiling): introduce an offcpu-approximation profiler feat(profiling): introduce an off-cpu-time approximation sample type Jun 16, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Comment thread ddtrace/internal/datadog/profiling/cmake/FindLibNative.cmake Outdated
Comment thread releasenotes/notes/profiling-off-cpu-time-approximation-776e49c0465b9ed5.yaml Outdated
@vlad-scherbich vlad-scherbich force-pushed the vlad/profiling-offcpu-approximation branch from 6a9212d to a5adb8c Compare June 17, 2026 23:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Comment thread ddtrace/internal/datadog/profiling/dd_wrapper/include/profiler_state.hpp Outdated
Comment thread ddtrace/internal/settings/profiling.py Outdated
Comment thread tests/profiling/collector/test_stack.py Outdated
Comment thread tests/profiling/collector/test_stack.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Comment thread tests/profiling/collector/test_stack.py
Comment thread ddtrace/internal/datadog/profiling/dd_wrapper/include/profiler_state.hpp Outdated
@pr-commenter

pr-commenter Bot commented Jun 18, 2026

Copy link
Copy Markdown

Benchmarks

Benchmark execution time: 2026-06-18 17:24:08

Comparing candidate commit 3a3c2ef in PR branch vlad/profiling-offcpu-approximation with baseline commit cd7e17b in branch main.

Found 0 performance improvements and 6 performance regressions! Performance is the same for 611 metrics, 10 unstable metrics.

scenario:iast_aspects-re_expand_aspect

  • 🟥 execution_time [+264.724µs; +302.988µs] or [+7.693%; +8.805%]

scenario:iastaspects-index_aspect

  • 🟥 execution_time [+16.561µs; +18.460µs] or [+13.450%; +14.992%]

scenario:iastaspects-ljust_aspect

  • 🟥 execution_time [+96.695µs; +111.379µs] or [+16.028%; +18.462%]

scenario:iastaspects-title_aspect

  • 🟥 execution_time [+34.369µs; +42.413µs] or [+10.205%; +12.594%]

scenario:iastaspectsospath-ospathbasename_aspect

  • 🟥 execution_time [+99.979µs; +109.050µs] or [+23.632%; +25.776%]

scenario:span-start

  • 🟥 execution_time [+1.099ms; +1.251ms] or [+7.035%; +8.013%]

@vlad-scherbich vlad-scherbich changed the title feat(profiling): introduce an off-cpu-time approximation sample type feat(profiling): introduce an off-cpu-time approximation profiler Jun 18, 2026
… ordering

ProfilerState::start() uses std::call_once, so the profile is initialized
with whatever type_mask is current on the first ddup.start() call in the
process. Earlier tests in the same pytest session call ddup.start() without
offcpu_time_enabled, preventing the OffCPU sample type from being registered
when the off-cpu tests run later.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 2 comments.

Comment thread ddtrace/internal/datadog/profiling/stack/src/stack_renderer.cpp Outdated
Comment thread ddtrace/internal/datadog/profiling/dd_wrapper/src/profile.cpp
…s unavailable

Suspended tasks (asyncio/greenlet not scheduled) emit off_cpu = wall_time,
which is exact and requires no CPU measurement. The previous has_cpu_time
gate was dropping these samples on platforms where CPU time is unavailable,
even though no cpu_time value is needed for the suspended-task case.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 1 comment.

Comment thread ddtrace/internal/settings/profiling.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants