Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ If your change does not need a CHANGELOG entry, add the "skip changelog" label t

## Unreleased

- fix(otlp-aws-exporter): avoid `RecursionError` when `pip_system_certs` replaces `ssl.SSLContext` on Python 3.12 by rebinding stale `botocore`/`urllib3` SSL context references and caching credentials in `AwsAuthSession`
- feat: support environment-configured endpoint visibility for HTTP operation names
([#718](https://github.com/aws-observability/aws-otel-python-instrumentation/pull/718))
- fix(lambda-layer): Standardize CompactConsoleLogRecordExporter output with CloudWatch OTLP backend schema.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,15 @@
# SPDX-License-Identifier: Apache-2.0

import logging
from threading import Lock

import requests
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
from botocore.session import Session

from amazon.opentelemetry.distro.patches._pip_system_certs_patches import apply_pip_system_certs_compatibility_patch

_logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -39,13 +42,70 @@ def __init__(self, aws_region: str, service: str, session: Session):
self._service: str = service
self._session: Session = session

# Cached credentials are resolved on the first ``request()`` call. The returned
# ``Credentials`` / ``RefreshableCredentials`` object handles its own expiry and
# rotation when its attributes are accessed, so caching the reference does not
# cache the underlying credential values.
self._credentials = None
self._credentials_resolved = False
self._credentials_lock = Lock()

super().__init__()

def _ensure_initialized(self) -> None:
"""Apply one-time, deferred initialization on the first ``request()`` call.

This runs after sitecustomize has fully completed (i.e., after any ``.pth``
based ``ssl.SSLContext`` injection from packages such as ``pip_system_certs``),
which is the only point at which we can safely re-align stale ``SSLContext``
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (future-proofing): double-checked locking and free-threaded Python.

This is a textbook double-checked locking pattern and works correctly in CPython due to the GIL. However, Python 3.13+ has experimental free-threaded builds (PEP 703) where reading _credentials_resolved outside the lock is a data race. Not blocking for now, but worth noting if this project plans to support --disable-gil in the future.

references captured by ``botocore`` / ``urllib3`` during ADOT startup.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Double-checked locking and free-threaded Python

The double-checked locking pattern is correct for CPython due to the GIL, but self._credentials_resolved is read without the lock on the fast path (two lines above). In CPython, reading/writing a bool is atomic due to the GIL, so this is safe in practice. However, if this code ever runs on a Python implementation without a GIL (e.g., the experimental free-threaded Python 3.13+), this would need a memory barrier or atomic.

Fine for now, but worth a brief comment noting the GIL dependency for future maintainers.


Credentials are also resolved once here. ``RefreshableCredentials`` handles
rotation internally on attribute access, so caching the reference is safe.

On a transient credential resolution failure (e.g., IMDS timeout), the
``_credentials_resolved`` flag is left ``False`` so the next ``request()`` call
will retry. Only a successful resolution latches the flag, matching the
original "retry every request" behavior for the failure path while keeping
the SSL-context-construction cost amortized to once on the success path.

Note: the read of ``_credentials_resolved`` outside the lock is safe because
Python's GIL makes attribute reads/writes atomic. On free-threaded Python
builds (3.13t+) this would need a memory barrier; revisit if/when we
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness concern: double-checked locking on free-threaded Python 3.13t+. The docstring acknowledges this, which is good. However Python 3.13 is already in pyproject.toml classifiers. If a user runs on a free-threaded 3.13t build, the unsynchronized read of self._credentials_resolved (here, outside the lock) and self._credentials in request() could produce torn reads or stale values. Consider always acquiring the lock (it is uncontended after the first successful call, so performance cost is negligible on CPython) or file a tracking issue for free-threaded build support.

support those.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (thread-safety of _credentials read): In the request() method, after _ensure_initialized() returns, the code reads self._credentials without holding the lock. Consider this race:

  1. Thread A calls _ensure_initialized(), enters the lock, credential resolution fails, sets self._credentials = None, returns without latching _credentials_resolved.
  2. Thread B calls _ensure_initialized(), enters the lock, credential resolution succeeds, sets self._credentials = <valid>, latches _credentials_resolved = True.
  3. Thread A (which already returned from _ensure_initialized before Thread B entered) now reads self._credentials in request(). Under GIL the attribute read is atomic, but if Thread B hasn't run its assignment yet (B is still waiting to acquire the lock while A already returned), A sees None and skips signing for that request.

This is a narrow window but possible when multiple threads share one AwsAuthSession instance. Consider returning the credentials from _ensure_initialized() directly, or having request() re-check self._credentials after confirming _credentials_resolved is True.

"""
if self._credentials_resolved:
return

with self._credentials_lock:
if self._credentials_resolved:
return

# Realign stale ssl.SSLContext references in botocore / urllib3 before
# the first credential resolution constructs an SSL context. This is a
# no-op when pip_system_certs is not installed.
try:
apply_pip_system_certs_compatibility_patch()
except Exception as patch_error: # pylint: disable=broad-except
_logger.warning("Failed to apply pip_system_certs compatibility patch: %s", patch_error)

try:
self._credentials = self._session.get_credentials()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note (double-checked locking ordering dependency): For correctness of the double-checked locking pattern, the ordering matters: self._credentials must be assigned before self._credentials_resolved = True (which it is). In CPython with GIL, this is guaranteed because both assignments are in the same thread within the lock, and the GIL prevents another thread from seeing a partial state.

However, if this code is ever refactored (e.g., the assignment order is swapped, or a future maintainer moves code around), the outer reader could see _credentials_resolved = True but _credentials still as None. A brief inline comment like # Order matters: set credentials before latching the flag above line 99 (self._credentials_resolved = True) would help prevent this subtle regression.

except Exception as cred_error: # pylint: disable=broad-except
# Don't latch _credentials_resolved on failure - leave it False so
# the next request retries credential resolution. This preserves
# self-healing behavior on transient errors (e.g., IMDS timeouts).
_logger.error("Failed to load AWS Credentials: %s", cred_error)
self._credentials = None
return

self._credentials_resolved = True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Permanent credential failure on transient errors

If get_credentials() raises (e.g., IMDS timeout, transient network blip), self._credentials stays None and _credentials_resolved is set to True unconditionally here. All subsequent request() calls will skip signing forever for the lifetime of this session/exporter -- there is no retry path.

This is a regression vs. the old behavior where get_credentials() was called on every request, meaning a transient failure on one export cycle would self-heal on the next.

Consider either:

  1. Not setting _credentials_resolved = True when credential resolution fails (so it retries next call), or
  2. Moving _credentials_resolved = True inside the try block (after the successful get_credentials() call), or
  3. Setting a TTL on the failure so it can be retried after some delay.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viw-test1 can you address this?

def request(self, method, url, *args, data=None, headers=None, **kwargs):
credentials = self._session.get_credentials()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Double-logging on credential failure. When get_credentials() raises inside _ensure_initialized(), the error is already logged with the exception detail via _logger.error. Then here in request(), this else branch fires again and logs 'Failed to load AWS Credentials' without the exception. This produces duplicate error logs per failed request. Consider removing this else log or downgrading to _logger.debug.

self._ensure_initialized()

if credentials:
signer = SigV4Auth(credentials, self._service, self._aws_region)
if self._credentials:
signer = SigV4Auth(self._credentials, self._service, self._aws_region)
request = AWSRequest(
method="POST",
url=url,
Expand All @@ -64,6 +124,6 @@ def request(self, method, url, *args, data=None, headers=None, **kwargs):
except Exception as signing_error: # pylint: disable=broad-except
_logger.error("Failed to sign request: %s", signing_error)
else:
_logger.error("Failed to load AWS Credentials: %s")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Double-logging on credential failure. When get_credentials() raises inside _ensure_initialized(), the error is already logged with the exception detail via _logger.error. Then here in request(), this else branch fires again and logs 'Failed to load AWS Credentials' without the exception. This produces duplicate error logs per failed request. Consider removing this else log or downgrading to _logger.debug.

_logger.error("Failed to load AWS Credentials")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix. The original _logger.error with a %s format specifier but no corresponding argument would produce a malformed log message (or raise in strict logging configurations). Removing the orphaned %s is correct.


return super().request(method=method, url=url, *args, data=data, headers=headers, **kwargs)
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0

from importlib.metadata import PackageNotFoundError, version
from logging import Logger, getLogger

_logger: Logger = getLogger(__name__)

# Module-level guard so the patch is applied at most once per process.
# The plain bool is intentional: the patch body itself is idempotent
# (re-running it produces the same final state), so a benign race between two
# threads where both observe ``_patch_applied is False`` and both run the rebind
# costs an extra dict assignment and nothing more. We don't pay for a lock here.
_patch_applied = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread-safety: module-level _patch_applied guard is not atomic

The _patch_applied flag is a plain boolean checked and set without a lock. While AwsAuthSession._ensure_initialized() has its own lock, apply_pip_system_certs_compatibility_patch() is a public function that could be called from multiple threads concurrently (e.g., if multiple exporters are initialized in parallel). Two threads could both read _patch_applied = False before either sets it to True, causing the patch to run twice.

This is low-severity since the patch itself is idempotent, but it contradicts the stated guarantee (applied at most once per process). Consider using a threading.Lock here as well, or document that idempotency makes the race benign.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (race condition consideration for free-threaded Python): The comment in _ensure_initialized correctly notes that the plain-bool guard is only safe under GIL, but this module-level _patch_applied flag has the same issue. On CPython with the GIL this is fine, but the comment in aws_auth_session.py explicitly calls out free-threaded Python 3.13t+ as a future concern.

Even under the GIL, if two AwsAuthSession instances (from different exporters) race here, both may observe _patch_applied is False and both will run the patch body. The comment acknowledges this (benign race, costs an extra dict assignment), but the _is_pip_system_certs_installed() call does I/O (package metadata lookup via importlib.metadata.version()) which is more expensive than a dict assignment. Consider using a threading.Lock here too if you expect multiple exporters to initialize concurrently, or at minimum acknowledge the I/O cost in the comment.


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread safety: module-level _patch_applied guard is racy.

The _patch_applied boolean is read/written without synchronization. Two threads calling apply_pip_system_certs_compatibility_patch() concurrently could both see _patch_applied = False and both execute the patch body. The patch itself is idempotent (rebinding an attribute twice is harmless), so this is not a correctness bug, but it is inconsistent with the careful double-checked locking used in _ensure_initialized().

Since the caller (_ensure_initialized) already holds _credentials_lock before calling this, the function is effectively single-threaded today. Worth a brief comment documenting that caller-side serialization assumption so future callers outside the lock do not introduce a real race.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread-safety note on the module-level guard: The comment correctly states a benign race is acceptable because the patch body is idempotent. Since the caller (_ensure_initialized) already serializes via its own per-instance lock, this race can only happen if apply_pip_system_certs_compatibility_patch is called from a different call site in the future. Worth documenting the assumption that the caller currently provides serialization.


def _is_pip_system_certs_installed() -> bool:
"""Is the pip_system_certs package installed?"""
try:
dist_version = version("pip_system_certs")
_logger.debug("pip_system_certs is installed: %s", dist_version)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code quality (minor): version("pip_system_certs") reads package metadata from disk (site-packages *.dist-info/METADATA). It runs on the first request() call, which is the OTLP export hot path. Since the module-level _patch_applied guard ensures this only runs once per process, the one-time cost is acceptable. Just flagging for awareness that this adds a small latency spike to the very first export (~1-5ms for metadata lookup depending on filesystem).

return True
except PackageNotFoundError as exc:
_logger.debug("pip_system_certs is not installed. %s", exc)
return False


def apply_pip_system_certs_compatibility_patch() -> None:
"""Re-bind stale ``ssl.SSLContext`` references in botocore/urllib3.

When ``pip_system_certs`` is installed, it injects ``truststore.SSLContext`` as the
process-wide ``ssl.SSLContext`` via a ``.pth`` file. The injection runs in the
``finally`` block of a ``site.execsitecustomize`` wrapper, i.e. *after*
``sitecustomize.py`` returns.

OpenTelemetry's auto-instrumentation entry point (``opentelemetry-instrument``)
runs from ``sitecustomize.py``, which loads the ADOT distro and transitively imports
``requests`` (via the upstream OTLP HTTP exporters) and ``botocore``. Both of those
modules capture a reference to ``ssl.SSLContext`` at import time. Because the import
happens before ``pip_system_certs``'s injection runs, the captured reference is the
original C-level ``ssl.SSLContext``, not the truststore-wrapped class.

On Python 3.12, ``ssl.SSLContext.options.__set__`` is implemented as
``super(SSLContext, SSLContext).options.__set__(self, value)`` where ``SSLContext``
is resolved from ``ssl``'s module globals at call time. After ``pip_system_certs``
runs, that name resolves to ``truststore.SSLContext``, and the ``super()`` chain
bounces between the original and truststore classes until the recursion limit
(~978 frames) is exceeded.

This patch re-binds ``botocore.httpsession.SSLContext`` and
``urllib3.util.ssl_.SSLContext`` to the *current* ``ssl.SSLContext``
(i.e., truststore's wrapper). truststore's own ``SSLContext.options`` setter does
not use the recursive ``super()`` pattern, so subsequent SSL context creations
succeed.

The patch is idempotent: a module-level guard ensures it only runs once per
process. It is a no-op when ``pip_system_certs`` is not installed or when the
references already match ``ssl.SSLContext``. ``ImportError`` is the only
expected failure (e.g., ``botocore`` or ``urllib3`` not installed in some
minimal environment) and is silently skipped per library.
"""
global _patch_applied # pylint: disable=global-statement
if _patch_applied:
return

# Only apply the patch when pip_system_certs is installed in user application space.
if not _is_pip_system_certs_installed():
_patch_applied = True
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defensive suggestion: verify ssl.SSLContext is actually the truststore wrapper

The patch reads ssl.SSLContext here and binds it to botocore/urllib3. This relies on the assumption that pip_system_certs has already completed its injection by the time this code runs (guaranteed when called from AwsAuthSession.request()).

However, if apply_pip_system_certs_compatibility_patch() is ever called from a different call site (e.g., during module import or distro initialization), ssl.SSLContext would still be the original, and this code would be a no-op (since botocore.httpsession.SSLContext is ssl.SSLContext would be True). This is safe but silently ineffective.

Consider adding a debug log that verifies ssl.SSLContext is actually the truststore wrapper (e.g., by checking ssl.SSLContext.module or ssl.SSLContext.name) to aid debugging if the call site assumptions change in the future.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consideration: coverage of other modules that capture ssl.SSLContext at import time.

This patch rebinds botocore.httpsession.SSLContext and urllib3.util.ssl_.SSLContext. If other dependencies in ADOT's transitive closure also capture ssl.SSLContext at import time during sitecustomize (e.g., aiohttp, httpx), they could hit the same recursion. Not necessarily something to fix now (YAGNI), but might be worth a brief note in the docstring that the patch covers the known affected modules and may need extension if other libraries are added to the auto-instrumentation.


# pylint: disable=import-outside-toplevel
import ssl
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security consideration: This rebinds botocore.httpsession.SSLContext and urllib3.util.ssl_.SSLContext to whatever ssl.SSLContext currently points to (truststore's wrapper). This is the correct fix for the recursion bug.

Worth noting: truststore delegates certificate verification to the OS trust store rather than certifi's bundled CA certificates. This is the intended behavior of pip_system_certs, and the patch is only active when that package is installed, so this is what the customer opted into. However, it means ADOT's TLS connections will now use the system trust store in this scenario. If a customer has a misconfigured system trust store (e.g., missing intermediate CAs), connections that previously worked with certifi could fail after this patch runs. Consider mentioning this behavior in the CHANGELOG entry or docs for clarity.


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit/future-proofing: Consider whether other libraries in the dependency tree might also hold stale ssl.SSLContext references (e.g., requests.adapters in some versions). Since requests delegates to urllib3, patching urllib3.util.ssl_.SSLContext should be sufficient for new SSL context creations in the current stack. But if this recursion is reported from a different call site in the future, this module is where to add additional rebindings.

try:
# pylint: disable=import-outside-toplevel
import botocore.httpsession

if botocore.httpsession.SSLContext is not ssl.SSLContext:
_logger.debug(
"Rebinding botocore.httpsession.SSLContext to current ssl.SSLContext (pip_system_certs detected)."
)
botocore.httpsession.SSLContext = ssl.SSLContext
except ImportError:
# botocore not installed; nothing to rebind on the botocore side.
pass

try:
# pylint: disable=import-outside-toplevel
import urllib3.util.ssl_

if urllib3.util.ssl_.SSLContext is not ssl.SSLContext:
_logger.debug(
"Rebinding urllib3.util.ssl_.SSLContext to current ssl.SSLContext (pip_system_certs detected)."
)
urllib3.util.ssl_.SSLContext = ssl.SSLContext
except ImportError:
# urllib3 not installed; nothing to rebind.
pass

_patch_applied = True
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,130 @@ def test_aws_auth_session(self, _, __):
self.assertIn(AUTHORIZATION_HEADER, actual_headers)
self.assertIn(X_AMZ_DATE_HEADER, actual_headers)
self.assertIn(X_AMZ_SECURITY_TOKEN_HEADER, actual_headers)

@patch("requests.Session.request", return_value=requests.Response())
@patch("botocore.session.Session.get_credentials", return_value=mock_credentials)
def test_credentials_are_resolved_once(self, mock_get_credentials, _):
"""Credentials must be resolved only once across multiple ``request()`` calls.

This is the hot-path mitigation for the pip_system_certs RecursionError: each
``get_credentials()`` call walks the credential resolver chain, which constructs
a urllib3 SSL context. Caching the returned object (``RefreshableCredentials``
rotates internally on attribute access) ensures the SSL context is created at
most once per exporter, not once per export.
"""
session = AwsAuthSession("us-east-1", "xray", get_aws_session())

for _ in range(5):
session.request("POST", AWS_OTLP_TRACES_ENDPOINT, data="", headers={})

self.assertEqual(mock_get_credentials.call_count, 1)

@patch("requests.Session.request", return_value=requests.Response())
def test_credentials_retry_after_transient_failure(self, _):
"""A transient ``get_credentials()`` failure must NOT latch the resolved
flag. The next ``request()`` call must retry resolution. This preserves
self-healing behavior on transient errors (e.g., IMDS timeouts) and matches
the pre-fix behavior on the failure path.
"""
# First call raises, subsequent calls succeed.
get_credentials_mock = patch(
"botocore.session.Session.get_credentials",
side_effect=[RuntimeError("transient"), mock_credentials, mock_credentials],
)
with get_credentials_mock as mock_get_credentials:
session = AwsAuthSession("us-east-1", "xray", get_aws_session())

# 1st request: get_credentials raises, no auth headers added.
headers_first = {}
session.request("POST", AWS_OTLP_TRACES_ENDPOINT, data="", headers=headers_first)
self.assertNotIn(AUTHORIZATION_HEADER, headers_first)

# 2nd request: get_credentials succeeds, auth headers must appear.
headers_second = {}
session.request("POST", AWS_OTLP_TRACES_ENDPOINT, data="", headers=headers_second)
self.assertIn(AUTHORIZATION_HEADER, headers_second)

# 3rd request: cached credentials reused, no further get_credentials calls.
headers_third = {}
session.request("POST", AWS_OTLP_TRACES_ENDPOINT, data="", headers=headers_third)
self.assertIn(AUTHORIZATION_HEADER, headers_third)

# Two resolution attempts: one failed, one succeeded; third request reuses cache.
self.assertEqual(mock_get_credentials.call_count, 2)

@patch("requests.Session.request", return_value=requests.Response())
@patch("botocore.session.Session.get_credentials", return_value=mock_credentials)
@patch(
"amazon.opentelemetry.distro.exporter.otlp.aws.common.aws_auth_session"
".apply_pip_system_certs_compatibility_patch"
)
def test_pip_system_certs_patch_invoked_on_first_request(self, mock_apply_patch, _, __):
"""The ssl.SSLContext rebind helper is invoked on the first ``request()`` call
and not re-invoked on subsequent calls.

The patch itself is a no-op when pip_system_certs is not installed, so this
test only asserts the call site, not the patch behavior."""
session = AwsAuthSession("us-east-1", "xray", get_aws_session())

session.request("POST", AWS_OTLP_TRACES_ENDPOINT, data="", headers={})
session.request("POST", AWS_OTLP_TRACES_ENDPOINT, data="", headers={})
session.request("POST", AWS_OTLP_TRACES_ENDPOINT, data="", headers={})

self.assertEqual(mock_apply_patch.call_count, 1)

@patch("requests.Session.request", return_value=requests.Response())
@patch(
"amazon.opentelemetry.distro.exporter.otlp.aws.common.aws_auth_session"
".apply_pip_system_certs_compatibility_patch",
side_effect=RuntimeError("simulated patch failure"),
)
@patch("botocore.session.Session.get_credentials", return_value=mock_credentials)
def test_patch_failure_does_not_break_request(self, _, __, ___):
"""If the SSL-context-rebind helper itself raises, the failure is logged
but ``request()`` still proceeds and signs successfully. The patch is
defensive infrastructure, not a hard precondition."""
session = AwsAuthSession("us-east-1", "xray", get_aws_session())
actual_headers: dict = {}

session.request("POST", AWS_OTLP_TRACES_ENDPOINT, data="", headers=actual_headers)

self.assertIn(AUTHORIZATION_HEADER, actual_headers)

@patch("requests.Session.request", return_value=requests.Response())
@patch("botocore.session.Session.get_credentials", return_value=mock_credentials)
def test_signing_failure_does_not_break_request(self, _, __):
"""If SigV4 signing itself raises, ``request()`` still issues the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code quality: Looking at the implementation, apply_pip_system_certs_compatibility_patch() is called on every entry into the lock (i.e., on each retry when credentials fail transiently), not just the first time. Since the patch has its own module-level _patch_applied guard this is safe but slightly wasteful -- each retry still pays for the function call overhead and the boolean check.

Consider either:

  1. Adding a separate instance-level _patch_attempted flag so the SSL patch is only called once per AwsAuthSession instance regardless of credential retry count, or
  2. Moving apply_pip_system_certs_compatibility_patch() outside/before the double-checked locking block entirely (it has its own module-level guard and is process-wide, so it does not need the instance lock).

unauthenticated request rather than crashing the caller."""
session = AwsAuthSession("us-east-1", "xray", get_aws_session())

with patch("amazon.opentelemetry.distro.exporter.otlp.aws.common.aws_auth_session.SigV4Auth") as mock_sigv4:
mock_sigv4.return_value.add_auth.side_effect = RuntimeError("signing boom")
actual_headers: dict = {}
# Should not raise
session.request("POST", AWS_OTLP_TRACES_ENDPOINT, data="", headers=actual_headers)

# No auth header because signing raised before headers could be merged.
self.assertNotIn(AUTHORIZATION_HEADER, actual_headers)

@patch("requests.Session.request", return_value=requests.Response())
@patch("botocore.session.Session.get_credentials", return_value=mock_credentials)
def test_concurrent_requests_resolve_credentials_once(self, mock_get_credentials, _):
"""Two threads racing on the first request must both observe a single
credential resolution. The double-checked locking in ``_ensure_initialized``
is what provides this guarantee."""
# pylint: disable=import-outside-toplevel
from threading import Thread

session = AwsAuthSession("us-east-1", "xray", get_aws_session())

def call():
session.request("POST", AWS_OTLP_TRACES_ENDPOINT, data="", headers={})

threads = [Thread(target=call) for _ in range(8)]
for t in threads:
t.start()
for t in threads:
t.join()

self.assertEqual(mock_get_credentials.call_count, 1)
Loading
Loading