feat(tracing): support http client error status configuration#17095
feat(tracing): support http client error status configuration#17095
Conversation
Codeowners resolved as |
|
✨ Fix all issues with BitsAI or with Cursor
|
Performance SLOsComparing candidate quinna/http-client-error-codes (d98ea98) with baseline main (876c5f1) ❌ Test Failures (1 suite)❌ sethttpmeta - 16/32❌ all-disabledTime: ❌ 32.881µs (SLO: <20.000µs 📈 +64.4%) vs baseline: 📈 +232.5% Memory: ✅ 37.434MB (SLO: <38.750MB -3.4%) vs baseline: +5.7% ❌ all-enabledTime: ❌ 69.520µs (SLO: <50.000µs 📈 +39.0%) vs baseline: 📈 +76.7% Memory: ✅ 37.415MB (SLO: <38.750MB -3.4%) vs baseline: +5.6% ❌ collectipvariant_existsTime: ❌ 68.741µs (SLO: <50.000µs 📈 +37.5%) vs baseline: 📈 +70.3% Memory: ✅ 37.238MB (SLO: <38.750MB -3.9%) vs baseline: +4.9% ❌ no-collectipvariantTime: ❌ 67.938µs (SLO: <50.000µs 📈 +35.9%) vs baseline: 📈 +72.5% Memory: ✅ 37.218MB (SLO: <38.750MB -4.0%) vs baseline: +5.0% ❌ no-useragentvariantTime: ❌ 66.728µs (SLO: <50.000µs 📈 +33.5%) vs baseline: 📈 +73.4% Memory: ✅ 37.336MB (SLO: <38.750MB -3.6%) vs baseline: +5.4% ❌ obfuscation-no-queryTime: ❌ 68.101µs (SLO: <50.000µs 📈 +36.2%) vs baseline: 📈 +71.6% Memory: ✅ 37.159MB (SLO: <38.750MB -4.1%) vs baseline: +4.9% ❌ obfuscation-regular-case-explicit-queryTime: ❌ 107.301µs (SLO: <90.000µs 📈 +19.2%) vs baseline: 📈 +41.2% Memory: ✅ 37.631MB (SLO: <38.750MB -2.9%) vs baseline: +5.2% ❌ obfuscation-regular-case-implicit-queryTime: ❌ 107.606µs (SLO: <90.000µs 📈 +19.6%) vs baseline: 📈 +40.4% Memory: ✅ 37.611MB (SLO: <38.750MB -2.9%) vs baseline: +4.7% ❌ obfuscation-send-querystring-disabledTime: ❌ 184.971µs (SLO: <170.000µs +8.8%) vs baseline: 📈 +19.4% Memory: ✅ 37.611MB (SLO: <38.750MB -2.9%) vs baseline: +5.0% ❌ obfuscation-worst-case-explicit-queryTime: ❌ 179.200µs (SLO: <160.000µs 📈 +12.0%) vs baseline: 📈 +20.0% Memory: ✅ 37.434MB (SLO: <38.750MB -3.4%) vs baseline: +4.5% ❌ obfuscation-worst-case-implicit-queryTime: ❌ 185.515µs (SLO: <170.000µs +9.1%) vs baseline: 📈 +19.5% Memory: ✅ 37.631MB (SLO: <38.750MB -2.9%) vs baseline: +5.1% ❌ useragentvariant_exists_1Time: ❌ 67.478µs (SLO: <50.000µs 📈 +35.0%) vs baseline: 📈 +72.4% Memory: ✅ 37.356MB (SLO: <38.750MB -3.6%) vs baseline: +5.5% ❌ useragentvariant_exists_2Time: ❌ 68.372µs (SLO: <50.000µs 📈 +36.7%) vs baseline: 📈 +71.3% Memory: ✅ 37.238MB (SLO: <38.750MB -3.9%) vs baseline: +5.2% ❌ useragentvariant_exists_3Time: ❌ 68.021µs (SLO: <50.000µs 📈 +36.0%) vs baseline: 📈 +72.3% Memory: ✅ 37.179MB (SLO: <38.750MB -4.1%) vs baseline: +4.9% ❌ useragentvariant_not_exists_1Time: ❌ 67.519µs (SLO: <50.000µs 📈 +35.0%) vs baseline: 📈 +72.5% Memory: ✅ 37.415MB (SLO: <38.750MB -3.4%) vs baseline: +5.7% ❌ useragentvariant_not_exists_2Time: ❌ 67.742µs (SLO: <50.000µs 📈 +35.5%) vs baseline: 📈 +72.9% Memory: ✅ 37.415MB (SLO: <38.750MB -3.4%) vs baseline: +5.5% 📈 Performance Regressions (1 suite)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 509.362µs (SLO: <700.000µs 📉 -27.2%) vs baseline: 📈 +17.8% Memory: ✅ 43.804MB (SLO: <46.000MB -4.8%) vs baseline: +5.4% ✅ ospathbasename_noaspectTime: ✅ 427.778µs (SLO: <700.000µs 📉 -38.9%) vs baseline: -2.1% Memory: ✅ 43.804MB (SLO: <46.000MB -4.8%) vs baseline: +5.4% ✅ ospathjoin_aspectTime: ✅ 626.967µs (SLO: <700.000µs 📉 -10.4%) vs baseline: +0.4% Memory: ✅ 43.723MB (SLO: <46.000MB -5.0%) vs baseline: +5.2% ✅ ospathjoin_noaspectTime: ✅ 630.714µs (SLO: <700.000µs -9.9%) vs baseline: +0.1% Memory: ✅ 43.772MB (SLO: <46.000MB -4.8%) vs baseline: +5.1% ✅ ospathnormcase_aspectTime: ✅ 346.263µs (SLO: <700.000µs 📉 -50.5%) vs baseline: -5.1% Memory: ✅ 43.804MB (SLO: <46.000MB -4.8%) vs baseline: +5.3% ✅ ospathnormcase_noaspectTime: ✅ 356.955µs (SLO: <700.000µs 📉 -49.0%) vs baseline: -3.7% Memory: ✅ 43.765MB (SLO: <46.000MB -4.9%) vs baseline: +5.0% ✅ ospathsplit_aspectTime: ✅ 487.679µs (SLO: <700.000µs 📉 -30.3%) vs baseline: -1.0% Memory: ✅ 43.745MB (SLO: <46.000MB -4.9%) vs baseline: +5.2% ✅ ospathsplit_noaspectTime: ✅ 496.619µs (SLO: <700.000µs 📉 -29.1%) vs baseline: -1.2% Memory: ✅ 43.872MB (SLO: <46.000MB -4.6%) vs baseline: +5.4% ✅ ospathsplitdrive_aspectTime: ✅ 374.705µs (SLO: <700.000µs 📉 -46.5%) vs baseline: -1.7% Memory: ✅ 43.745MB (SLO: <46.000MB -4.9%) vs baseline: +5.1% ✅ ospathsplitdrive_noaspectTime: ✅ 72.200µs (SLO: <700.000µs 📉 -89.7%) vs baseline: -1.9% Memory: ✅ 43.844MB (SLO: <46.000MB -4.7%) vs baseline: +5.4% ✅ ospathsplitext_aspectTime: ✅ 459.430µs (SLO: <700.000µs 📉 -34.4%) vs baseline: -2.2% Memory: ✅ 43.726MB (SLO: <46.000MB -4.9%) vs baseline: +5.1% ✅ ospathsplitext_noaspectTime: ✅ 467.209µs (SLO: <700.000µs 📉 -33.3%) vs baseline: -0.8% Memory: ✅ 43.726MB (SLO: <46.000MB -4.9%) vs baseline: +5.1% 🟡 Near SLO Breach (2 suites)🟡 djangosimple - 30/30✅ appsecTime: ✅ 19.743ms (SLO: <22.300ms 📉 -11.5%) vs baseline: +0.4% Memory: ✅ 68.814MB (SLO: <73.500MB -6.4%) vs baseline: +5.1% ✅ exception-replay-enabledTime: ✅ 1.399ms (SLO: <1.450ms -3.5%) vs baseline: +0.1% Memory: ✅ 66.770MB (SLO: <71.500MB -6.6%) vs baseline: +4.7% ✅ iastTime: ✅ 19.802ms (SLO: <22.250ms 📉 -11.0%) vs baseline: ~same Memory: ✅ 68.746MB (SLO: <75.000MB -8.3%) vs baseline: +4.9% ✅ profilerTime: ✅ 15.224ms (SLO: <16.550ms -8.0%) vs baseline: +0.2% Memory: ✅ 60.449MB (SLO: <61.000MB 🟡 -0.9%) vs baseline: +5.0% ✅ resource-renamingTime: ✅ 19.694ms (SLO: <21.750ms -9.5%) vs baseline: ~same Memory: ✅ 68.802MB (SLO: <73.500MB -6.4%) vs baseline: +5.0% ✅ span-code-originTime: ✅ 20.209ms (SLO: <28.200ms 📉 -28.3%) vs baseline: +1.4% Memory: ✅ 68.437MB (SLO: <75.000MB -8.8%) vs baseline: +4.4% ✅ tracerTime: ✅ 19.801ms (SLO: <21.750ms -9.0%) vs baseline: +0.4% Memory: ✅ 68.825MB (SLO: <75.000MB -8.2%) vs baseline: +5.1% ✅ tracer-and-profilerTime: ✅ 21.161ms (SLO: <23.500ms -10.0%) vs baseline: ~same Memory: ✅ 70.665MB (SLO: <75.000MB -5.8%) vs baseline: +4.7% ✅ tracer-dont-create-db-spansTime: ✅ 19.782ms (SLO: <21.500ms -8.0%) vs baseline: -0.4% Memory: ✅ 68.803MB (SLO: <75.000MB -8.3%) vs baseline: +5.0% ✅ tracer-minimalTime: ✅ 16.850ms (SLO: <17.500ms -3.7%) vs baseline: ~same Memory: ✅ 68.631MB (SLO: <75.000MB -8.5%) vs baseline: +4.8% ✅ tracer-nativeTime: ✅ 19.613ms (SLO: <21.750ms -9.8%) vs baseline: -0.2% Memory: ✅ 68.809MB (SLO: <72.500MB -5.1%) vs baseline: +5.1% ✅ tracer-no-cachesTime: ✅ 17.656ms (SLO: <19.650ms 📉 -10.1%) vs baseline: ~same Memory: ✅ 68.729MB (SLO: <75.000MB -8.4%) vs baseline: +4.8% ✅ tracer-no-databasesTime: ✅ 19.360ms (SLO: <20.100ms -3.7%) vs baseline: +0.2% Memory: ✅ 68.776MB (SLO: <75.000MB -8.3%) vs baseline: +5.0% ✅ tracer-no-middlewareTime: ✅ 19.451ms (SLO: <21.500ms -9.5%) vs baseline: ~same Memory: ✅ 68.814MB (SLO: <75.000MB -8.2%) vs baseline: +5.0% ✅ tracer-no-templatesTime: ✅ 19.622ms (SLO: <22.000ms 📉 -10.8%) vs baseline: +0.6% Memory: ✅ 68.848MB (SLO: <73.500MB -6.3%) vs baseline: +5.0% 🟡 recursivecomputation - 8/8✅ deepTime: ✅ 311.286ms (SLO: <320.950ms -3.0%) vs baseline: ~same Memory: ✅ 37.493MB (SLO: <38.750MB -3.2%) vs baseline: +6.4% ✅ deep-profiledTime: ✅ 333.191ms (SLO: <359.150ms -7.2%) vs baseline: ~same Memory: ✅ 43.922MB (SLO: <46.000MB -4.5%) vs baseline: +4.9% ✅ mediumTime: ✅ 7.299ms (SLO: <7.400ms 🟡 -1.4%) vs baseline: -0.1% Memory: ✅ 36.392MB (SLO: <38.000MB -4.2%) vs baseline: +5.2% ✅ shallowTime: ✅ 1.017ms (SLO: <1.050ms -3.2%) vs baseline: +1.5% Memory: ✅ 36.313MB (SLO: <38.000MB -4.4%) vs baseline: +5.0%
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e6a75cfef8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| """DD_TRACE_HTTP_SERVER_ERROR_STATUSES must still classify client spans when | ||
| DD_TRACE_HTTP_CLIENT_ERROR_STATUSES is not explicitly configured (backward compat). |
There was a problem hiding this comment.
Do you know when we might be able to change this behavior?
There was a problem hiding this comment.
We could:
- Add a deprecation warning when DD_TRACE_HTTP_SERVER_ERROR_STATUSES is read as the fallback
for client spans (so users know to migrate) - Remove the fallback in a future major version bump (once customers have had some time to use it)
Do you think we should add a deprecation warning for the first case?
| if span.span_type == SpanTypes.HTTP: | ||
| is_error = config._http_client.is_error_code(int_status_code) | ||
| else: | ||
| is_error = config._http_server.is_error_code(int_status_code) |
There was a problem hiding this comment.
Fun way to deduplicate these, strictly personal preference:
component = "client" if span.span_type == SpanTypes.HTTP
is_error = getattr(config, f"_http_{component}").is_error_code(int_status_code)There was a problem hiding this comment.
Thanks! Did a similar refactor / deduplication in: cacb078
There was a problem hiding this comment.
Note that changes to this file will only have an effect if this release note has not yet been included in a stable release.
mabdinur
left a comment
There was a problem hiding this comment.
Can we add a test case to a few of the integrations this change to impact? Ex: http.client and maybe httpx? Testing the utils is helpful but there's no guarantee it's actually being used in an integration
| else: | ||
| span._set_attribute(http.STATUS_CODE, str(status_code)) | ||
| if config._http_server.is_error_code(int_status_code): | ||
| http_config = config._http_client if span.span_type == SpanTypes.HTTP else config._http_server |
There was a problem hiding this comment.
Should we check the span kind here and use that to determine whether http_client or http_server should be used?
Also to make this change we need to make sure span kind is set, if not we should generate a warning.
ddtrace/internal/settings/_config.py
Outdated
| return True | ||
| return False | ||
|
|
||
| class _HTTPClientConfig(object): |
There was a problem hiding this comment.
any reason to have this object here and not in a separate file at the same level ? Especially as http.py is already existing in settings folder.
There was a problem hiding this comment.
Fair point, it makes more sense to have it in http.py since that's where the http-specific configs live. Addressed in: d98ea98
ddtrace/internal/settings/_config.py
Outdated
| if _client_configured is not _unset: | ||
| _error_statuses: str = _get_config("DD_TRACE_HTTP_CLIENT_ERROR_STATUSES", "500-599") | ||
| else: | ||
| _error_statuses = _get_config("DD_TRACE_HTTP_SERVER_ERROR_STATUSES", "500-599") |
There was a problem hiding this comment.
Question: why are we falling back on http_server ? What if a user explicitly specifies http_server but not client on purpose ?
There was a problem hiding this comment.
It might make sure if we could fall back to DD_TRACE_HTTP_SERVER_ERROR_STATUSES only if neither var is set, and default to "500-599" otherwise. However, this would change the existing behavior. We could keep the fallback but add a deprecation warning that in a future version when DD_TRACE_HTTP_CLIENT_ERROR_STATUSES is unset but DD_TRACE_HTTP_SERVER_ERROR_STATUSES is set, client spans will default to 500-599 independently.
There was a problem hiding this comment.
Added the deprecation warning in: d98ea98
| SpanKind.SERVER, | ||
| ) | ||
| # Backward compat: fall back to span_type to determine the config | ||
| http_config = config._http_client if span.span_type == SpanTypes.HTTP else config._http_server |
There was a problem hiding this comment.
Is it possible that span_type is not set as well ?
There was a problem hiding this comment.
Yes, span_type can be None. When span_type is None, it silently falls back to config._http_server.
Alternatively, we could also:
- Default to
_http_client- sinceSpanTypes.HTTPsemantically means "outgoing HTTP request" and a missing span_type on an HTTP-instrumented span is more likely a client that forgot to set it - Skip the error classification and log a warning that
span.kindis not set
I think it makes more sense to log the warning for now
ddtrace/internal/settings/_config.py
Outdated
| return True | ||
| return False | ||
|
|
||
| class _HTTPClientConfig(object): |
There was a problem hiding this comment.
Except the attributes, the code is the same as _HTTPServerConfig(object). Could we create a base class ?
…on warnings for fallbacks
Description
Following: #10723, previously
DD_TRACE_HTTP_SERVER_ERROR_STATUSESapplied to all HTTP spans regardless of whether the span was a client or server span. This meant HTTP client spans couldn't be configured independently.This PR introduces
DD_TRACE_HTTP_CLIENT_ERROR_STATUSES(default: 500-599 to match server behavior), which allows users to separately configure which HTTP status codes are treated as errors on HTTP client spans.Testing
Unit tests added in tests/tracer/test_trace_utils.py which cover:
- Client spans using
DD_TRACE_HTTP_CLIENT_ERROR_STATUSES- Custom ranges (e.g. 400-499,500-599)
Risks
Low. The default value (500-599) matches the prior behavior, so users see no change unless they explicitly set the new env var.
Additional Notes