Skip to content

Revert "feat: auto-detect and mutually exclude native vs third-party agentic instrumentors (#729)"#750

Open
ezhang6811 wants to merge 3 commits into
aws-observability:mainfrom
ezhang6811:revert-autodetect-agent-instrumentation
Open

Revert "feat: auto-detect and mutually exclude native vs third-party agentic instrumentors (#729)"#750
ezhang6811 wants to merge 3 commits into
aws-observability:mainfrom
ezhang6811:revert-autodetect-agent-instrumentation

Conversation

@ezhang6811
Copy link
Copy Markdown
Contributor

This reverts commit 02ada61 (#729)

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@ezhang6811 ezhang6811 requested a review from a team as a code owner May 21, 2026 19:41
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR reverts the auto-detection/mutual-exclusion logic from #729 and replaces it with a simpler two-env-var approach (AGENT_OBSERVABILITY_ENABLED for legacy v1, AWS_AGENTIC_OBSERVABILITY_OPT_IN for v2). The overall design is cleaner and easier to reason about.

Key findings (see inline comments):

  1. Behavior change - missing disabled instrumentations: http, urllib3, and requests are no longer in the disabled list, which will re-enable noisy HTTP spans for existing legacy users.

  2. Behavior change - CREWAI_DISABLE_TELEMETRY: Only set in the v2 path, not legacy. Existing users may get duplicate CrewAI traces.

  3. Typo: "brokens" in a comment.

  4. .gitignore: .coverage.* removal seems unrelated to the feature revert.

The refactoring of _configure_common_agent_observability() to share common setup is a good pattern. Tests are comprehensive and cover the new env var combinations well.

"openai_agents": "aws_openai_agents",
}
AGENT_OBSERVABILITY_DISABLED_INSTRUMENTATIONS = (
"sqlalchemy,psycopg2,pymysql,sqlite3,aiopg,asyncpg,mysql_connector,"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug/Behavior change: This list no longer includes http, urllib3, and requests, which were present in the old disabled list for the legacy path. Existing users on AGENT_OBSERVABILITY_ENABLED will suddenly get HTTP-level instrumentation re-enabled, producing potentially noisy low-level spans that were previously suppressed. If intentional, this should be documented as a breaking change. If not, consider adding http,urllib3,requests back. The same applies to AWS_AGENTIC_OBSERVABILITY_DISABLED_INSTRUMENTATIONS below.

self._configure_common_agent_observability(AWS_AGENTIC_OBSERVABILITY_DISABLED_INSTRUMENTATIONS)
os.environ.setdefault(OTEL_METRICS_EXPORTER, "otlp")
os.environ.setdefault("CREWAI_DISABLE_TELEMETRY", "true")
elif is_agent_observability_enabled():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior change: CREWAI_DISABLE_TELEMETRY is only set in the is_aws_agentic_observability_opt_in() branch but NOT in the legacy is_agent_observability_enabled() branch below. The old code set this unconditionally when agent observability was enabled. Existing legacy-path users will now get CrewAI's built-in telemetry re-enabled, which could cause duplicate/conflicting traces. Consider adding it to _configure_common_agent_observability() or to the legacy branch explicitly.

# Some third-party SDKs register the same entry point name as the upstream
# OTel packages that we depend on. For Agentic Observability legacy mode, skip our bundled
# OTel instrumentation so that existing third-party setups are not brokens.
if (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "are not brokens" should be "are not broken".

Comment thread .gitignore
coverage.xml
.coverage
.coverage.*
.nox
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing .coverage.* from .gitignore means coverage data files from parallel test runs (e.g., .coverage.hostname.12345) will now appear as untracked files. This was likely added in the same commit being reverted but appears unrelated to the agentic instrumentation feature itself. Consider keeping this entry.

liustve
liustve previously approved these changes May 21, 2026
@ezhang6811 ezhang6811 force-pushed the revert-autodetect-agent-instrumentation branch from 18b741f to fcd39d1 Compare May 21, 2026 22:10
"llama-index": "aws_llama-index",
"llama_index": "aws_llama-index",
"mcp": "aws_mcp",
"openai_agents": "aws_openai_agents",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "brokens" should be "broken".

from amazon.opentelemetry.distro.aws_opentelemetry_distro import (
AGENT_OBSERVABILITY_DISABLED_INSTRUMENTATIONS,
AwsOpenTelemetryDistro,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (test isolation): CREWAI_DISABLE_TELEMETRY was removed from the env_vars_to_check list, but the production code still sets it in the is_aws_agentic_observability_opt_in() branch. If test_configure_ai_observability_opt_in runs, this env var will be set but never cleaned up in tearDown, potentially polluting subsequent tests. Suggested fix: add CREWAI_DISABLE_TELEMETRY back to the cleanup list.

_logger.info("AWS Agentic Observability enabled.")
self._configure_common_agent_observability(AWS_AGENTIC_OBSERVABILITY_DISABLED_INSTRUMENTATIONS)
os.environ.setdefault(OTEL_METRICS_EXPORTER, "otlp")
os.environ.setdefault("CREWAI_DISABLE_TELEMETRY", "true")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavioral subtlety: When both AWS_AGENTIC_OBSERVABILITY_OPT_IN=true AND AGENT_OBSERVABILITY_ENABLED=true are set simultaneously, this elif means only the new path executes -- no region-based endpoint auto-configuration, no awsemf metrics exporter. Meanwhile, is_agentic_observability_enabled() (used in the configurator) returns True for both, enabling LLO handling, unsampled span export, etc. This may be intentional precedence, but the behavior is surprising for a user migrating incrementally (setting the new var while keeping the old). Consider documenting this precedence explicitly, or logging a warning when both are set.

mock_is_installed.return_value = False

AwsOpenTelemetryDistro()._configure()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test assertion: This test verifies the new AWS_AGENTIC_OBSERVABILITY_OPT_IN path but does not assert that CREWAI_DISABLE_TELEMETRY is set to true, even though the production code explicitly sets it (line 193 in the source). Adding this assertion would catch regressions if the line is accidentally removed.

"""
if is_agent_observability_enabled() and self._should_skip_instrumentor(entry_point):
if self._should_skip_instrumentor(entry_point):
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (inconsistency): The method is called as self._configure_common_agent_observability(...) but is a @staticmethod. While valid in Python, consider making this a regular method (drop @staticmethod) for consistency with how it is invoked, or call it as AwsOpenTelemetryDistro._configure_common_agent_observability(...).

@@ -300,11 +305,58 @@ def test_configure_agent_observability_v1(
self.assertEqual(os.environ.get(OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED), "true")
self.assertEqual(os.environ.get(APPLICATION_SIGNALS_ENABLED_CONFIG), "false")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test case: The _should_skip_instrumentor logic explicitly checks is_agent_observability_enabled() and not is_aws_agentic_observability_opt_in(). There is no test case for the scenario where BOTH env vars are true simultaneously. In that case, the skip is bypassed (because of the not is_aws_agentic_observability_opt_in() guard), and the opentelemetry-instrumentation-openai-agents-v2 package will be loaded. Adding a case like {"env": {"AGENT_OBSERVABILITY_ENABLED": "true", "AWS_AGENTIC_OBSERVABILITY_OPT_IN": "true"}, "entry_name": "openai_agents", "dist_name": "opentelemetry-instrumentation-openai-agents-v2", "should_load": True} would verify this intended behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants