Skip to content

Comments

Fix parent-child span linking in async LangChain#123

Open
NikitaVoitov wants to merge 3 commits intosignalfx:mainfrom
NikitaVoitov:fix/parent-child-span-linking
Open

Fix parent-child span linking in async LangChain#123
NikitaVoitov wants to merge 3 commits intosignalfx:mainfrom
NikitaVoitov:fix/parent-child-span-linking

Conversation

@NikitaVoitov
Copy link

Summary

Fixes parent-child span linking in LangChain instrumentation for async applications by resolving parent_run_id to actual OpenTelemetry Span objects.

Problem: In async LangChain/LangGraph applications using await graph.ainvoke(), all spans appear as siblings (flat hierarchy) instead of proper parent-child nesting. Spans share the same Trace ID but all point to the root span as parent.

Solution: Explicitly resolve parent_run_id UUID to the actual parent Span object and pass it to the span emitter for proper context propagation.

Fixes #122

The Bug (Before)

POST /agents/sf-query (trace_id: 90a9d9...)
├── invoke_agent healthcare_agent (parent: HTTP)
├── step model (parent: HTTP)           ← All siblings
├── chat unknown_model (parent: HTTP)   ← Flat hierarchy
├── step tools (parent: HTTP)
└── tool query_member_data (parent: HTTP)

The Fix (After)

POST /agents/sf-query (trace_id: 90a9d9...)
└── invoke_agent healthcare_agent
    ├── step model
    │   └── chat claude-3-5-sonnet      ← Proper nesting
    └── step tools
        └── tool query_member_data

Changes

types.py

Added parent_span: Span | None = None field to GenAI base class:

@dataclass(kw_only=True)
class GenAI:
    """Base type for all GenAI telemetry entities."""
    context_token: ContextToken | None = None
    span: Span | None = None
    span_context: SpanContext | None = None
    # Parent span for proper parent-child trace linking in async contexts
    parent_span: Span | None = None  # NEW
    # ...

callback_handler.py

Added _resolve_parent_span(run_id) method to lookup parent span from tracked entities:

def _resolve_parent_span(self, parent_run_id: UUID | None) -> Span | None:
    """Resolve parent_run_id to actual span for trace hierarchy.
    
    This is required for async applications where Python's context
    propagation doesn't automatically carry the parent span across
    await boundaries.
    """
    if parent_run_id is None:
        return None
    return self._handler.get_span_by_run_id(parent_run_id)

Updated all entity creation methods to set parent_span:

  • _start_agent_invocation() - AgentInvocation
  • on_chain_start() - ToolCall, Step
  • on_chat_model_start() - LLMInvocation
  • on_tool_start() - ToolCall

The span emitter already has logic to use parent_span if provided:

# In span_emitter.py
parent_span = getattr(invocation, "parent_span", None)
parent_ctx = trace.set_span_in_context(parent_span) if parent_span else None
cm = self._tracer.start_as_current_span(..., context=parent_ctx)

Previously, parent_span was always None, so parent_ctx was None, and spans inherited from the current active context. In sync code this works (context propagates automatically), but in async code the context is lost between await calls.

By explicitly setting parent_span, we ensure correct parent-child relationships regardless of sync/async execution.

Testing

  • Added test_parent_span_linking - verifies parent_span is correctly resolved from parent_run_id
  • Added test_nested_span_hierarchy_three_levels - verifies 3-level hierarchy (agent → step → llm)
  • All existing tests pass
  • Live tested with async healthcare agent on Snowflake SPCS
image

Files Changed

File Changes
util/opentelemetry-util-genai/src/opentelemetry/util/genai/types.py Added parent_span field
instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/callback_handler.py Added _resolve_parent_span() method, set parent_span on entities
instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_callback_handler_agent.py Added 2 tests, added get_span_by_run_id() to stub

NikitaVoitov and others added 3 commits January 12, 2026 20:48
- Add parent_span field to GenAI base class in types.py
- Add _resolve_parent_span() method to callback_handler.py
- Set parent_span on all entity types for proper span linking
- Add tests for parent-child span linking
Trace from async healthcare agent on Snowflake SPCS showing:
- All spans have same parentId (HTTP root) instead of proper nesting
- Demonstrates Issue signalfx#1 (flat hierarchy) and Issue signalfx#2 (unknown_model)
Raw trace export from Splunk O11y showing flat hierarchy bug
@NikitaVoitov NikitaVoitov requested review from a team as code owners January 13, 2026 12:39
@github-actions
Copy link

github-actions bot commented Jan 13, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@NikitaVoitov
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] LangChain spans have flat hierarchy instead of proper parent-child nesting in async applications

1 participant