Skip to content

fix(profiling): fix race condition on fork in child#17183

Draft
KowalskiThomas wants to merge 2 commits intomainfrom
dd/kowalski/fix/profiler-postfork-sigsegv
Draft

fix(profiling): fix race condition on fork in child#17183
KowalskiThomas wants to merge 2 commits intomainfrom
dd/kowalski/fix/profiler-postfork-sigsegv

Conversation

@KowalskiThomas
Copy link
Copy Markdown
Contributor

Description

Fixes a segmentation fault (SIGSEGV) that occurs when a profiled application calls os.fork().

Root cause: The Sampler and ProfilerState each register pthread_atfork child handlers. Due to LIFO ordering, the Sampler's handler runs first: it cleans up stale state and then restarts the sampling thread via restart_after_fork(). On a multi-core system, the new thread can begin executing immediately — acquiring profile_mtx and calling ddog_prof_Profile_add2 — before ProfilerState's handler runs. When ProfilerState's handler subsequently reinitializes profile_mtx (placement-new) and drops/recreates the profile, it races with the already-running sampling thread, causing concurrent mutation and a SIGSEGV in free() during ddog_prof_Profile_drop.

Fix: Move the sampling thread restart from the Sampler's atfork child handler to the end of ProfilerState::postfork_child(), after profile and dictionary are fully re-initialized. The Sampler's atfork handler now only does cleanup (safe to run first). A new free function stack_restart_sampler_after_fork() in sampler.cpp is called via extern from profiler_state.cpp.

Changes

  • Remove restart_after_fork() call from stack_atfork_child() in sampler.cpp
  • Add stack_restart_sampler_after_fork() free function in sampler.cpp
  • Call it at the end of ProfilerState::postfork_child() via extern declaration in profiler_state.cpp

Testing

  • Ran cppcheck and clang-format on all modified C++ files (passed)
  • Full native build could not be run due to sandboxed environment network restrictions

Risks

Low — the sampling thread restart still happens during the same __libc_fork call, just later in the atfork handler sequence. No change to steady-state profiler behavior.

Additional Notes

Crash stack trace: free → drop_in_place<IndexSet<StackTrace>> → drop_in_place<Profile> → ddog_prof_Profile_drop → Profile::postfork_child → __libc_fork → os_fork


PR by Bits - View session in Datadog

Comment @DataDog to request changes

Co-authored-by: KowalskiThomas <14239160+KowalskiThomas@users.noreply.github.com>
@datadog-prod-us1-3
Copy link
Copy Markdown

View session in Datadog

Bits Dev status: ✅ Done

CI Auto-fix: Disabled | Enable

Comment @DataDog to request changes

@datadog-prod-us1-5
Copy link
Copy Markdown

I can only run on private repositories.

@KowalskiThomas KowalskiThomas added the Profiling Continous Profling label Mar 30, 2026
@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da bot commented Mar 30, 2026

Codeowners resolved as

ddtrace/internal/datadog/profiling/dd_wrapper/include/ddup_interface.hpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/include/profiler_state.hpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/src/ddup_interface.cpp    @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/src/profiler_state.cpp    @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack/src/sampler.cpp                @DataDog/profiling-python

…ler_after_fork

Replace the direct extern C reference from dd_wrapper to stack_restart_sampler_after_fork()
(defined in _stack) with a callback registration pattern. _stack registers the callback at
one_time_setup() via ddup_config_stack_restart_callback(), so dd_wrapper.so no longer has
an undefined symbol at load time.

Fixes: undefined symbol _Z32stack_restart_sampler_after_forkv when loading libdd_wrapper.so
@datadog-datadog-prod-us1-2
Copy link
Copy Markdown

datadog-datadog-prod-us1-2 bot commented Mar 30, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 1655736 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bits AI Profiling Continous Profling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant