Native instrumentation with OpenTelemetry#329
Conversation
…into users/robrandao/otel-sample
There was a problem hiding this comment.
Pull request overview
This pull request introduces native OpenTelemetry instrumentation to the Microsoft Agents SDK, enabling comprehensive observability through traces, metrics, and logs. The changes add telemetry hooks at key points in the agent lifecycle including turn processing, adapter operations, and storage interactions, while also providing a test sample demonstrating the integration.
Changes:
- Added OpenTelemetry integration to core SDK with AgentTelemetry class for instrumentation
- Integrated telemetry tracking into agent turn processing, adapter operations, and storage methods
- Created test sample demonstrating OpenTelemetry configuration and usage with aiohttp
- Enhanced testing framework to support Activity template aliasing (from/from_property compatibility)
Reviewed changes
Copilot reviewed 21 out of 24 changed files in this pull request and generated 27 comments.
Show a summary per file
| File | Description |
|---|---|
| test_samples/otel/src/telemetry.py | OpenTelemetry configuration helper for test sample |
| test_samples/otel/src/start_server.py | Sample aiohttp server with telemetry integration |
| test_samples/otel/src/requirements.txt | Dependencies for OpenTelemetry sample |
| test_samples/otel/src/main.py | Entry point configuring telemetry before agent initialization |
| test_samples/otel/src/env.TEMPLATE | Environment configuration template with OTLP settings |
| test_samples/otel/src/agent_metric.py | Sample-specific metrics wrapper (has import name mismatch) |
| test_samples/otel/src/agent.py | Sample agent with manual telemetry instrumentation |
| libraries/microsoft-agents-hosting-core/setup.py | Added OpenTelemetry dependencies |
| libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/observability/_agent_telemetry.py | Core telemetry infrastructure with context managers for instrumentation |
| libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/observability/init.py | Exports AgentTelemetry and agent_telemetry singleton |
| libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/app/agent_application.py | Wraps turn processing with telemetry |
| libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/storage/storage.py | Adds telemetry to base storage operations |
| libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/storage/memory_storage.py | Adds telemetry to in-memory storage implementation |
| libraries/microsoft-agents-storage-cosmos/microsoft_agents/storage/cosmos/cosmos_db_storage.py | Minor whitespace change |
| libraries/microsoft-agents-hosting-fastapi/microsoft_agents/hosting/fastapi/cloud_adapter.py | Wraps adapter processing with telemetry |
| libraries/microsoft-agents-hosting-aiohttp/microsoft_agents/hosting/aiohttp/cloud_adapter.py | Wraps adapter processing with telemetry |
| dev/tests/sdk/observability/test_observability.py | Tests for telemetry functionality |
| dev/tests/scenarios/quickstart.py | Renamed init_app to init_agent |
| dev/tests/scenarios/init.py | Updated to use renamed init_agent function |
| dev/microsoft-agents-testing/tests/core/fluent/test_model_template.py | Added tests for from/from_property aliasing |
| dev/microsoft-agents-testing/microsoft_agents/testing/core/fluent/utils.py | Added from/from_property aliasing logic |
| dev/microsoft-agents-testing/microsoft_agents/testing/core/fluent/model_template.py | Integrated aliasing into template classes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from microsoft_agents.hosting.core import TurnContext | ||
|
|
There was a problem hiding this comment.
Unused import: TurnContext is imported but never used in this file. Consider removing it to keep imports clean.
| from microsoft_agents.hosting.core import TurnContext |
| from opentelemetry.sdk.trace import TracerProvider | ||
| from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter | ||
|
|
||
|
|
There was a problem hiding this comment.
Unused imports: TracerProvider, SimpleSpanProcessor, and ConsoleSpanExporter are imported but never used in this file. Remove them to keep imports clean.
| from opentelemetry.sdk.trace import TracerProvider | |
| from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter |
| def _ts() -> float: | ||
| """Helper function to get current timestamp in milliseconds""" | ||
| return datetime.now(timezone.utc).timestamp() * 1000 | ||
|
|
There was a problem hiding this comment.
Unused function: The _ts() helper function is defined but never used in this file. Consider removing it.
| def _ts() -> float: | |
| """Helper function to get current timestamp in milliseconds""" | |
| return datetime.now(timezone.utc).timestamp() * 1000 |
|
|
||
| def __init__(self, tracer: Tracer | None = None, meter: Meter | None = None): | ||
| if tracer is None: | ||
| tracer = trace.get_tracer("M365.agents", "1.0.0") |
There was a problem hiding this comment.
Inconsistent indentation: Line 33 has an extra space before 'tracer'. Should be a single space like line 35.
| tracer = trace.get_tracer("M365.agents", "1.0.0") | |
| tracer = trace.get_tracer("M365.agents", "1.0.0") |
| with self.tracer.start_as_current_span(span_name) as span: | ||
| attributes = self._extract_attributes_from_context(context) | ||
| span.set_attributes(attributes) | ||
| # span.add_event(f"{span_name} started", attributes) |
There was a problem hiding this comment.
Commented-out code: Line 97 contains a commented-out span.add_event() call. Either remove it if it's not needed or uncomment it if it should be active.
| # span.add_event(f"{span_name} started", attributes) |
| async def entry_point(req: Request) -> Response: | ||
|
|
||
| logger.info("Request received at /api/messages endpoint.") | ||
| text = await req.text() |
There was a problem hiding this comment.
Unused variable: The 'text' variable on line 23 is assigned but never used. Remove it if it's not needed.
| text = await req.text() |
| APP["adapter"] = agent_application.adapter | ||
|
|
||
| try: | ||
| run_app(APP, host="localhost", port=environ.get("PORT", 3978)) |
There was a problem hiding this comment.
Type mismatch: environ.get("PORT", 3978) returns a string or int, but the port parameter expects an int. The default value 3978 is an int, but if PORT is set in the environment it will be a string. Convert to int: port=int(environ.get("PORT", 3978)).
| run_app(APP, host="localhost", port=environ.get("PORT", 3978)) | |
| run_app(APP, host="localhost", port=int(environ.get("PORT", 3978))) |
| self.tracer = trace.get_tracer("A365.AgentFramework") | ||
| self.meter = metrics.get_meter("A365.AgentFramework", "1.0.0") |
There was a problem hiding this comment.
Inconsistent naming: The test sample uses "A365.AgentFramework" for the tracer/meter name (line 28-29), while the core library uses "M365.agents" (see libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/observability/_agent_telemetry.py line 33). Consider using consistent naming across the codebase for better observability and to avoid confusion.
| self.tracer = trace.get_tracer("A365.AgentFramework") | |
| self.meter = metrics.get_meter("A365.AgentFramework", "1.0.0") | |
| self.tracer = trace.get_tracer("M365.agents") | |
| self.meter = metrics.get_meter("M365.agents", "1.0.0") |
| success_callback(span, duration) | ||
| else: | ||
|
|
||
| if failure_callback: | ||
| failure_callback(span, exception) |
There was a problem hiding this comment.
Missing error handling for callbacks: If success_callback or failure_callback raise an exception in lines 137-142, it will propagate and potentially mask the original exception. Consider wrapping callback invocations in try-except blocks to ensure telemetry errors don't break application logic.
| success_callback(span, duration) | |
| else: | |
| if failure_callback: | |
| failure_callback(span, exception) | |
| try: | |
| success_callback(span, duration) | |
| except Exception as callback_exc: | |
| # Ensure telemetry callback failures do not affect application logic | |
| span.record_exception(callback_exc) | |
| else: | |
| if failure_callback: | |
| try: | |
| failure_callback(span, exception) | |
| except Exception as callback_exc: | |
| # Ensure telemetry callback failures do not mask the original exception | |
| span.record_exception(callback_exc) |
| OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST=".*" | ||
| OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE=".*" | ||
|
|
||
| OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_CLIENT_REQUEST=".*" | ||
| OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_CLIENT_RESPONSE=".*" No newline at end of file |
There was a problem hiding this comment.
Using OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_* with value ".*" causes OpenTelemetry to capture and export all HTTP request and response headers, including Authorization, Cookie, and other secret-bearing headers. Anyone with access to the telemetry backend or pipeline could then retrieve credentials or other sensitive data from these exported headers. Limit these variables to a specific allowlist of non-sensitive headers and explicitly exclude authentication and other secret-related headers from capture.
| OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST=".*" | |
| OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE=".*" | |
| OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_CLIENT_REQUEST=".*" | |
| OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_CLIENT_RESPONSE=".*" | |
| OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST="user-agent,x-request-id,x-correlation-id" | |
| OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE="content-type,content-length,x-request-id,x-correlation-id" | |
| OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_CLIENT_REQUEST="user-agent,x-request-id,x-correlation-id" | |
| OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_CLIENT_RESPONSE="content-type,content-length,x-request-id,x-correlation-id" |
No description provided.