fix: preserve multi-turn history in Strands ConversationTurn by tejaskash · Pull Request #454 · aws/bedrock-agentcore-sdk-python

tejaskash · 2026-05-05T15:26:23Z

Summary

StrandsEventParser.extract_conversation_turn silently drops conversation history when a span carries multiple gen_ai.*.message events — common on every multi-turn agent invocation. This PR preserves the full history, separates prior assistant turns from the current model output, and keeps the existing API working via a deprecation shim.

The bug

Two interacting defects in span_to_adot_serializer/strands_converter.py:

1. User-message collapse — extract_conversation_turn stored gen_ai.user.message content in a scalar:

case cls.EVENT_USER_MESSAGE:
    user_message = event_attrs.get("content", "")  # overwrites on every event

A span carrying N gen_ai.user.message events (the conversation history a Strands agent replays to the model) would retain only the last one. For a 9-turn conversation, 8 user turns were silently dropped before the span was ever serialized.

2. Assistant-message miscategorization — gen_ai.assistant.message events and gen_ai.choice events were both .append()ed to the same assistant_messages list:

case cls.EVENT_CHOICE:           # current-turn model output
    ...
    assistant_messages.append(...)
case cls.EVENT_ASSISTANT_MESSAGE:  # prior-turn history input
    ...
    assistant_messages.append(...)

Per OTel GenAI semconv, gen_ai.{system,user,assistant,tool}.message events describe input context (including prior assistant turns the model reads as history), while gen_ai.choice describes the current-turn output. Mixing them means downstream consumers — ADOTDocumentBuilder.build_conversation_log_record, Online Evaluations, Observability dashboards — cannot tell history apart from new output and receive a ConversationTurn that misrepresents what the model actually did.

Reproduction

A span from a 3-turn conversation (3 user msgs, 2 prior assistant msgs, 1 new choice) produced a ConversationTurn with:

Field	Expected	Actual (before fix)
user content retained	3 turns	1 (newest only)
`assistant_messages`	1 entry (choice)	3 (2 history + choice)
`history_messages`	2 entries	field did not exist

Fix

ConversationTurn (adot_models.py) adds two fields:

user_messages: List[str] — every user turn observed on the span, in order.
history_messages: List[Dict] — prior assistant turns from gen_ai.assistant.message events, kept separate from the current-turn output.

user_message is retained as a @property that returns the most recent entry and emits a DeprecationWarning when it truncates more than one turn, so existing callers keep working while being nudged toward user_messages. Constructing ConversationTurn(user_message=...) still works — the initializer accepts either the legacy scalar or the new list.

StrandsEventParser.extract_conversation_turn (strands_converter.py) appends every gen_ai.user.message to user_messages, routes gen_ai.assistant.message to history_messages, and keeps assistant_messages scoped to gen_ai.choice output only.

ADOTDocumentBuilder.build_conversation_log_record (adot_models.py) emits all user turns followed by all history assistant turns as input.messages, so the serialized ADOT record carries the full context the model actually received.

Tests

Updated test_extract_conversation_turn_assistant_message — gen_ai.assistant.message now lands in history_messages, not assistant_messages.
New test_extract_conversation_turn_preserves_all_user_messages — multiple gen_ai.user.message events all retained.
New test_extract_conversation_turn_separates_history_from_choice — the end-to-end multi-turn case from the bug report.
New ConversationTurn dataclass tests: legacy scalar init, list init, deprecation-warning behavior, rejection of both-at-once, history preservation.
New test_build_conversation_log_record_multi_turn_history — builder emits [user, user, user, assistant-history, assistant-history] on input and single-entry output.

Full suite: 1726 passed, 0 failed. Serializer module: 41/41. ruff, ruff format, and pre-commit clean.

Backwards compatibility

ConversationTurn(user_message=...) constructor form still works.
turn.user_message still returns a string (the most recent user turn).
ADOT wire format body.input.messages / body.output.messages shape is unchanged; input.messages just contains more entries on multi-turn spans.
Consumers that read assistant_messages now get only the current-turn output. If a caller previously relied on the buggy behavior where history messages showed up in assistant_messages, they should migrate to history_messages. This is the intended correctness change.

There is a companion bug with the same failure mode in aws-opentelemetry-distro's LLOHandler.process_spans (llo_handler.py:329 dedupes events by name; llo_handler.py:148-152 tags gen_ai.assistant.message with source: "output"). That needs a separate fix in the aws-observability/aws-otel-python-instrumentation repo; this PR only addresses the AgentCore SDK side.

Test plan

Existing serializer tests pass unchanged (where correct) or updated (where they asserted the buggy behavior)
New regression tests cover multi-turn user, multi-turn history, and mixed-role preservation
Full tests/ suite passes (1726 passed)
ruff check / ruff format / pre-commit hooks clean
End-to-end repro shows user_messages holds all turns and assistant_messages holds only the current gen_ai.choice

StrandsEventParser.extract_conversation_turn previously stored the gen_ai.user.message content in a scalar, so spans carrying multiple user-message events (common in multi-turn agent conversations) dropped all but the newest entry. It also appended gen_ai.assistant.message events — which per OTel GenAI semconv are prior-turn input context — into the same list as gen_ai.choice outputs, mixing history with the current model response. ConversationTurn now exposes user_messages (List[str]) and history_messages (List[Dict]) alongside the existing assistant_messages and tool_results. user_message is retained as a backwards-compatible property that returns the most recent user turn and emits a DeprecationWarning when it truncates. ADOTDocumentBuilder now emits every user turn plus all prior assistant turns as input.messages, so downstream consumers (Online Evaluations, Observability dashboards) see the full context the model received. Refs: V2197498381. Companion fix needed upstream in aws-opentelemetry-distro LLOHandler, which has the same class of bug.

github-actions · 2026-05-05T15:26:47Z

⚠️ Breaking Change Warning

Found 3 potential breaking change(s) in this PR:

�[1msrc/bedrock_agentcore/evaluation/span_to_adot_serializer/adot_models.py�[0m:88: ConversationTurn.assistant_messages: �[33mAttribute value was changed�[39m: None -> list(assistant_messages or [])
�[1msrc/bedrock_agentcore/evaluation/span_to_adot_serializer/adot_models.py�[0m:89: ConversationTurn.tool_results: �[33mAttribute value was changed�[39m: None -> list(tool_results or [])
�[1msrc/bedrock_agentcore/evaluation/span_to_adot_serializer/adot_models.py�[0m:65: ConversationTurn.init: �[33mReturn types are incompatible�[39m: None -> None

Note: This is an automated static analysis check. Some flagged changes may be intentional.
Please confirm each item is expected and, if so, add a migration note to CHANGELOG.md.

tejaskash · 2026-05-05T15:28:40Z

Thanks, bot. Walking through the three flags:

ConversationTurn.assistant_messages: None -> list(assistant_messages) if assistant_messages is not None else [] — false positive. This is a signature diff, not a value change. The class moved from an auto-generated @dataclass __init__ (where the field annotation was List[Dict[str, Any]] with no default, so callers had to pass a value) to a custom __init__ that defaults missing args to []. Existing callers that passed assistant_messages=... get identical behavior; callers that omitted it previously raised TypeError and now get an empty list. Not a breaking change.
ConversationTurn.tool_results: None -> list(tool_results) if tool_results is not None else [] — same as above. Signature change from required-positional to defaulted, non-breaking.
ConversationTurn.__init__: Return types are incompatible: None -> None — false positive, the bot is diffing an auto-synthesized dataclass __init__ against the explicit one. Both return None.

The checker did miss the one intentional behavior change worth calling out: assistant_messages no longer contains prior gen_ai.assistant.message turns — those move to the new history_messages field. This is the bug fix (per OTel GenAI semconv, those events are input context, not current-turn output). Callers that iterated assistant_messages expecting both should either read both fields, or read body.input.messages from build_conversation_log_record, which now carries the full history. The legacy user_message scalar accessor and constructor form continue to work via a deprecation shim.

aidandaly24

Clean fix for a real data-loss bug. Backwards compatibility is well handled, tests are thorough. Two minor nits inline — neither is blocking.

aidandaly24 · 2026-05-05T19:48:45Z

                        tool_results.append(content)

-        if user_message and assistant_messages:
+        if user_messages and (assistant_messages or history_messages):


nit: The condition changed from user_message and assistant_messages to user_messages and (assistant_messages or history_messages). This means a span with user_messages + tool_results but no assistant_messages or history_messages now returns None — which could drop valid tool-use-only turns that previously returned a ConversationTurn (when history messages were incorrectly bucketed into assistant_messages).

Might be worth adding a test that confirms the intended behavior for a user-message + tool-result-only span (no choice, no history), and deciding whether tool_results should also satisfy this guard.

Fixed — gate is back to requiring assistant_messages. See the reply in the parent thread for details.

Added test_extract_conversation_turn_tool_results_only_returns_none for your suggested case. Quick verification of the "previously returned ConversationTurn" claim: the old guard was user_message and assistant_messages, and tool_results were never in the guard — so a user+tool_result-only span with no assistant events returned None in the old code too. No regression. The gate was broadened to user_messages and (assistant_messages or history_messages) in the first pass of this PR, but per jariys comment it is now tightened back to require assistant_messages. Whether tool-only turns should produce a ConversationTurn is a separate feature decision for the Evaluations team; out of scope for this bugfix.

aidandaly24 · 2026-05-05T19:48:45Z

+        input_messages: List[Dict[str, Any]] = [
+            {"content": {"content": user_msg}, "role": "user"} for user_msg in conversation.user_messages
+        ]
+        input_messages.extend(conversation.history_messages)


nit: This serializes input as [user, user, user, assistant-history, assistant-history] rather than preserving the original interleaving ([user, assistant, user, assistant, user]). The test asserts this flat order so it seems intentional — but worth confirming with the Observability/Evaluations team whether downstream consumers expect role-grouped input or chronological interleaving. If they ever need to reconstruct the actual conversation flow, this ordering makes it impossible.

Went with chronological. ConversationTurn.input_messages now interleaves user and assistant events in arrival order — for [u1, a1, u2, a2, u3] downstream sees [user, assistant, user, assistant, user], matching the real conversation the model received. Test asserts that order.

jariy17 · 2026-05-05T19:55:39Z

+    returning the most recent user turn; new code should read ``user_messages``.
+    """
+
+    user_messages: List[str] = field(default_factory=list)


These four field(default_factory=list) declarations are dead weight — the custom __init__ below sets the instance vars directly, so these are never used. Delete them.

Gone. The new class is a plain class with explicit attribute assignment in __init__.

jariy17 · 2026-05-05T19:55:40Z

                case cls.EVENT_USER_MESSAGE:
-                    user_message = event_attrs.get("content", "")
+                    content = event_attrs.get("content", "")
+                    if content:


if content: silently drops empty events. The old scalar path had different behavior — user_message = "" would fail the outer if user_message gate and reject the whole turn. Now empty events are skipped individually. If the intent is to drop empty content, add a log line so it's visible.

Added logger.debug(...) on all three empty-content skip paths (user, assistant, tool). The outer-gate semantics are also restored — a parse that yields only an empty user event now fails the gate (has_user_input requires a non-empty input_messages entry with role=user), matching the old if user_message behavior.

jariy17 · 2026-05-05T19:55:40Z

+
+    def test_user_message_alias_returns_latest(self):
+        """user_message returns the latest entry; accessing it with history emits a DeprecationWarning."""
+        import warnings


import warnings inside the test body — move it to the module-level imports. Same issue at line ~199 with import pytest, which is already imported at line 5.

Hoisted import warnings to the module-level imports. import pytest was already imported at line 5 — the duplicate inside the test body is gone.

jariy17 · 2026-05-05T19:55:40Z

@@ -129,7 +129,43 @@ def test_extract_conversation_turn_assistant_message(self, mock_event):
        turn = StrandsEventParser.extract_conversation_turn(events)



Let's add a test for the no-choice path here — [user.message, assistant.message] with no gen_ai.choice event. The broadened gate enables it now but there's no coverage for what output.messages looks like in that case.

Added test_extract_conversation_turn_assistant_message_only_returns_none. Since the gate now requires assistant_messages, the no-choice path returns None and never reaches build_conversation_log_record.

jariy17 · 2026-05-05T20:00:25Z

⚠️ Breaking Change Warning

Found 3 potential breaking change(s) in this PR:

�[1msrc/bedrock_agentcore/evaluation/span_to_adot_serializer/adot_models.py�[0m:91: ConversationTurn.assistant_messages: �[33mAttribute value was changed�[39m: None -> list(assistant_messages) if assistant_messages is not None else [] �[1msrc/bedrock_agentcore/evaluation/span_to_adot_serializer/adot_models.py�[0m:92: ConversationTurn.tool_results: �[33mAttribute value was changed�[39m: None -> list(tool_results) if tool_results is not None else [] �[1msrc/bedrock_agentcore/evaluation/span_to_adot_serializer/adot_models.py�[0m:70: ConversationTurn.init: �[33mReturn types are incompatible�[39m: None -> None

Note: This is an automated static analysis check. Some flagged changes may be intentional.
Please confirm each item is expected and, if so, add a migration note to CHANGELOG.md.

look into these too

…model Addresses review comments from jariy17, aidandaly24, notgitika on #454. Data model: replace `user_messages: List[str]` + `history_messages: List[Dict]` split with a single `input_messages: List[Dict]` field that preserves the chronological arrival order of `gen_ai.user.message` and `gen_ai.assistant.message` events. For a span `[u1, a1, u2, a2, u3, choice]`, downstream now receives `[user, assistant, user, assistant, user]` so the actual conversation flow is reconstructable, per OTel GenAI semconv. ConversationTurn: drop @DataClass decorator (it was conflicting with the custom __init__ and leaving the class-level field defaults dead), write explicit __init__, __repr__, __eq__ so equality compares actual instance state. `user_message: str` property retained for backwards compatibility, now returns the most recent user entry from `input_messages` and warns when history is present. Parser: gate tightened back to require `assistant_messages` so a span with no `gen_ai.choice` does not emit an empty-output ConversationTurn. Empty event content now logs at debug level instead of silently skipping. Tests: move duplicated fixtures into a shared conftest.py; hoist `import warnings` / `import pytest` to module-level; update the multi-turn regression test to assert chronological ordering rather than role-grouped; add regression tests for no-choice-path and user+tool_result-only-path returning None.

tejaskash requested a review from a team May 5, 2026 15:26

tejaskash temporarily deployed to auto-approve May 5, 2026 15:26 — with GitHub Actions Inactive

aidandaly24 previously approved these changes May 5, 2026

View reviewed changes

jariy17 requested changes May 5, 2026

View reviewed changes

tejaskash dismissed aidandaly24’s stale review via 66ffbec May 5, 2026 20:12

jariy17 approved these changes May 5, 2026

View reviewed changes

tejaskash merged commit 10da345 into main May 5, 2026
41 of 42 checks passed

tejaskash deleted the fix/conversation-turn-multi-message-dropping branch May 5, 2026 20:37

tejaskash mentioned this pull request May 5, 2026

test: add multi-turn conversation serialization integration test #455

Open

3 tasks

		@@ -129,7 +129,43 @@ def test_extract_conversation_turn_assistant_message(self, mock_event):
		turn = StrandsEventParser.extract_conversation_turn(events)

Conversation

tejaskash commented May 5, 2026

Summary

The bug

Reproduction

Fix

Tests

Backwards compatibility

Related

Test plan

Uh oh!

github-actions Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Breaking Change Warning

Uh oh!

tejaskash commented May 5, 2026

Uh oh!

aidandaly24 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jariy17 commented May 5, 2026

⚠️ Breaking Change Warning

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented May 5, 2026 •

edited

Loading