fix(profiling): fix race condition on fork in child#17183
Draft
KowalskiThomas wants to merge 2 commits intomainfrom
Draft
fix(profiling): fix race condition on fork in child#17183KowalskiThomas wants to merge 2 commits intomainfrom
KowalskiThomas wants to merge 2 commits intomainfrom
Conversation
Co-authored-by: KowalskiThomas <14239160+KowalskiThomas@users.noreply.github.com>
|
I can only run on private repositories. |
Codeowners resolved as |
…ler_after_fork Replace the direct extern C reference from dd_wrapper to stack_restart_sampler_after_fork() (defined in _stack) with a callback registration pattern. _stack registers the callback at one_time_setup() via ddup_config_stack_restart_callback(), so dd_wrapper.so no longer has an undefined symbol at load time. Fixes: undefined symbol _Z32stack_restart_sampler_after_forkv when loading libdd_wrapper.so
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🔗 Commit SHA: 1655736 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes a segmentation fault (SIGSEGV) that occurs when a profiled application calls
os.fork().Root cause: The Sampler and ProfilerState each register
pthread_atforkchild handlers. Due to LIFO ordering, the Sampler's handler runs first: it cleans up stale state and then restarts the sampling thread viarestart_after_fork(). On a multi-core system, the new thread can begin executing immediately — acquiringprofile_mtxand callingddog_prof_Profile_add2— before ProfilerState's handler runs. When ProfilerState's handler subsequently reinitializesprofile_mtx(placement-new) and drops/recreates the profile, it races with the already-running sampling thread, causing concurrent mutation and a SIGSEGV infree()duringddog_prof_Profile_drop.Fix: Move the sampling thread restart from the Sampler's atfork child handler to the end of
ProfilerState::postfork_child(), after profile and dictionary are fully re-initialized. The Sampler's atfork handler now only does cleanup (safe to run first). A new free functionstack_restart_sampler_after_fork()in sampler.cpp is called viaexternfrom profiler_state.cpp.Changes
restart_after_fork()call fromstack_atfork_child()in sampler.cppstack_restart_sampler_after_fork()free function in sampler.cppProfilerState::postfork_child()via extern declaration in profiler_state.cppTesting
Risks
Low — the sampling thread restart still happens during the same
__libc_forkcall, just later in the atfork handler sequence. No change to steady-state profiler behavior.Additional Notes
Crash stack trace:
free → drop_in_place<IndexSet<StackTrace>> → drop_in_place<Profile> → ddog_prof_Profile_drop → Profile::postfork_child → __libc_fork → os_forkPR by Bits - View session in Datadog
Comment @DataDog to request changes