[PROF-11311] Remove datadog_profiling_loader#4356
Merged
Conversation
**What does this PR do?** This PR removes the `datadog_profiling_loader` native extension, as it's no longer needed. The profiling native loader was added in #2003 . Quoting from that PR's description: > When Ruby loads a native extension (using `require`), it uses > `dlopen(..., RTLD_LAZY | RTLD_GLOBAL)` > (https://github.com/ruby/ruby/blob/67950a4c0a884bdb78d9beb4405ebf7459229b21/dln.c#L362). > > This means that every symbol exposed directly or indirectly by that > native extension becomes visible to every other extension in the > Ruby process. This can cause issues, see > rubyjs/mini_racer#179 . > > Ruby's extension loading mechanism is not configurable -- there's > no way to tell it to use different flags when calling `dlopen`. > To get around this, this commit introduces introduces another > extension (profiling loader) which has only a single responsibility: > mimic Ruby's extension loading mechanism, but when calling > `dlopen` use a different set of flags. > > This idea was shamelessly stolen from @lloeki's work in > rubyjs/mini_racer#179, big thanks! ...and importantly ends with: > Note that, that we know of, the profiling native extension only > exposes one potentially problematic symbol: > `rust_eh_personality` (coming from libddprof/libdatadog). > > Future versions of Rust have been patched not to expose this > (see rust-lang/rust#95604 (comment)) > so we may want to revisit the need for this loader in the future, > and perhaps delete it if we no longer require its services :) And we have reached the situation predicted in that description: 1. Nowadays libdatadog no longer exposes `rust_eh_personality` 2. We have a test that validates that only expected symbols are exported by the libdatadog library (see DataDog/libdatadog#573 ). Any new symbols that show up would break shipping new libdatadog versions to rubygems.org until we review them. 3. The `libdatadog_api` extension, which we've been shipping for customers since release 2.3.0 back in July 2024 has always been loaded directly without a loader without issues. Thus, I think it's the right time to get rid of the loader. **Motivation:** Having the loader around is not zero cost; we've run into/caused a few issues ( #2250 and #3582 come to mind). It also adds overhead to development: every time we need to rebuild the extensions, it's an extra extension that needs to be prepared, rebuilt, copied, etc. I've been meaning to get rid of the loader for some time now, and this came up again this week so I decided it was time to do it. **Additional Notes:** In the future, we can always ressurrect this approach if we figure out we need it again. We've also discussed internally about proposing upstream to the Ruby VM to maybe add an API to do what the loader was doing, so that we wouldn't need the weird workaround. We haven't yet decided if we're going to do that (help welcome!). **How to test the change?** The loader was responsible for loading the rest of profiling. Thus, any test that uses profiling was also validating the loader and now will validate that we're doing fine without it. TL;DR green CI is good.
Contributor
Datadog ReportBranch report: ✅ 0 Failed, 22075 Passed, 1477 Skipped, 6m 23.79s Total Time |
BenchmarksBenchmark execution time: 2025-02-07 11:14:50 Comparing candidate commit db67572 in PR branch Found 1 performance improvements and 0 performance regressions! Performance is the same for 30 metrics, 2 unstable metrics. scenario:profiler - sample timeline=false
|
This was an interesting one... Once we removed the profiling loader extension, our tests using Ractors started failing with: > Ractor::UnsafeError: ractor unsafe method called from not main ractor When I started investigating with gdb, I discovered that because we were initializing our extension without going through Ruby, we were skipping this part: https://github.com/ruby/ruby/blob/7178593558080ca529abb61ef27038236ab2687d/load.c#L1301-L1302 : ```c ext_config_push(th, &prev_ext_config); handle = rb_vm_call_cfunc(rb_vm_top_self(), load_ext, path, VM_BLOCK_HANDLER_NONE, path); ``` that is, the `ext_config_push` is what's used by Ruby to validate if a gem is safe to use from Ractors or not. (If you're curious, that value then affects function definition, which controls wether Ruby will check or not for being called from a Ractor). Because we were skipping this entire mechanism, we implicitly were getting the same result as `rb_ext_ractor_safe(true)`. Once we removed the loader, this started failing. And I missed it before opening my PR since for reasons documented in the profiler ractor tests in detail, we don't run ractor-related tests by default. So this issue is one more reason why having the loader may create its own set of issues and why I'm happy to get rid of it.
Can you tell it's Friday?
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4356 +/- ##
==========================================
- Coverage 97.73% 97.72% -0.01%
==========================================
Files 1368 1367 -1
Lines 83014 82963 -51
Branches 4221 4213 -8
==========================================
- Hits 81134 81077 -57
- Misses 1880 1886 +6 ☔ View full report in Codecov by Sentry. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR removes the
datadog_profiling_loadernative extension, as it's no longer needed.The profiling native loader was added in
#2003 .
Quoting from that PR's description:
...and importantly ends with:
And we have reached the situation predicted in that description:
Nowadays libdatadog no longer exposes
rust_eh_personalityWe have a test that validates that only expected symbols are exported by the libdatadog library (see [NO-TICKET] Validate that all public symbols are prefixed in Ruby releases libdatadog#573 ).
Any new symbols that show up would break shipping new libdatadog versions to rubygems.org until we review them.
The
libdatadog_apiextension, which we've been shipping for customers since release 2.3.0 back in July 2024 has always been loaded directly without a loader without issues.Thus, I think it's the right time to get rid of the loader.
Motivation:
Having the loader around is not zero cost; we've run into/caused a few issues ( #2250 and #3582 come to mind).
It also adds overhead to development: every time we need to rebuild the extensions, it's an extra extension that needs to be prepared, rebuilt, copied, etc.
I've been meaning to get rid of the loader for some time now, and this came up again this week so I decided it was time to do it.
Change log entry
None.
Additional Notes:
Note that none if this is visible to customers -- the loader was just an additional layer we used when loading the profiler.
In the future, we can always resurrect this approach if we figure out we need it again.
We've also discussed internally about proposing upstream to the Ruby VM to maybe add an API to do what the loader was doing, so that we wouldn't need the weird workaround.
We haven't yet decided if we're going to do that (help welcome!).
How to test the change?
The loader was responsible for loading the rest of profiling.
Thus, any test that uses profiling was also validating the loader and now will validate that we're doing fine without it.
TL;DR green CI is good.