fix(templates): key per-session agent state by context.session_id across all HTTP starter templates (#808, #809)#1639
fix(templates): key per-session agent state by context.session_id across all HTTP starter templates (#808, #809)#1639aidandaly24 wants to merge 3 commits into
Conversation
…oss all HTTP starter base templates (aws#808, aws#809) All five HTTP starter base templates (strands, openaiagents, googleadk, langchain_langgraph, autogen) plus the regenerated assets snapshot. Only base/main.py template content changed; the strands hasMemory branch and all other framework files are untouched. Refs aws#808 Refs aws#809
Package TarballHow to installgh release download pr-1639-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.21.0.tgz |
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Thanks for the fix — the per-session memory bug is real and the overall approach (key in-process state by context.session_id) is the right design. Most of the diff looks good; flagging a few correctness/scale issues that I think need to be addressed before merge.
The core concern: the PR description explicitly justifies the 128-entry cap for the openaiagents/autogen/strands templates ("caps process memory and evicted sessions restart fresh"), but the googleadk and langchain_langgraph templates use module-level in-memory stores (InMemorySessionService, InMemorySaver) with no bound at all. That just trades the original cross-session bleed (#809-style) for an unbounded memory leak in any long-running process serving many sessions (e.g. agentcore dev over a long session, or non-Runtime deployments which the PR description specifically calls out as the failure mode of #809). The fix should be consistent across all five templates.
The FIFO-vs-LRU comments inline are smaller but worth correcting since the docstrings claim LRU.
| session_service = InMemorySessionService() | ||
| session = await session_service.create_session( | ||
| # Module-level session service and runner (preserves history across invocations) | ||
| _session_service = InMemorySessionService() |
There was a problem hiding this comment.
Unbounded session growth. _session_service = InMemorySessionService() at module scope retains every (app_name, user_id, session_id) triple forever — there is no eviction path. The PR description argues that the 128-entry cap on the other templates "caps process memory and evicted sessions restart fresh", but this template has no cap, so a long-running process (notably agentcore dev over time, and any non-Runtime deployment) will grow without bound as new session ids arrive.
The ADK InMemorySessionService doesn't expose a public eviction API as far as I can see, so options are roughly:
- Wrap session creation in a bounded FIFO/LRU and call
_session_service.delete_session(...)on eviction (ADK does exposedelete_session). - Keep a
collections.OrderedDictof recently-touched(user_id, session_id)keys alongside the service and prune viadelete_sessiononce it exceeds N (e.g. 128, to match the other templates). - Subclass / wrap to enforce the cap at create-time.
Whichever you pick, please match the 128-entry behavior already documented for openaiagents/autogen/strands so the fix is consistent across templates.
| tools = [add_numbers] | ||
|
|
||
| # Module-level checkpointer preserves conversation history across invocations | ||
| _checkpointer = InMemorySaver() |
There was a problem hiding this comment.
Unbounded checkpoint growth. _checkpointer = InMemorySaver() is module-level with no eviction, so every distinct thread_id (= session_id) keeps its checkpoint forever. Same correctness concern as the googleadk template: the 128-cap rationale in the PR description doesn't apply here, so this template will leak memory in any long-running process.
A couple of options:
- Switch to a bounded saver (e.g. wrap
InMemorySaverin something that tracks insertion order and calls_checkpointer.delete_thread(thread_id)past 128 entries —InMemorySaverexposes astoragedict you can prune). - Use
SqliteSaver/AsyncSqliteSaverwith a temp file path for the base template (durable, doesn't grow process memory). - At minimum, document the unboundedness in a comment and explicitly tell the reader to swap in a durable checkpointer for production — but a 128-cap to match the other templates is the cleanest fix.
|
|
||
| # Reuses one AssistantAgent per session_id so each session keeps its own | ||
| # in-process conversation history (best-effort; resets on cold start). Caches up | ||
| # to 128 active sessions; the oldest is evicted and its history reset. |
There was a problem hiding this comment.
The eviction policy is FIFO, not LRU, despite the docstring. Plain dict preserves insertion order, and next(iter(_agents)) returns the oldest-inserted key. Reading an existing entry via _agents[session_id] does not move it to the end, so an actively-used session can be evicted while an idle one sticks around.
Two ways to fix:
- Either drop the "LRU" claim from the comment (the policy is FIFO/insertion-order, which is fine if intended), or
- Use
collections.OrderedDictand call_agents.move_to_end(session_id)on every hit so it's actually LRU.
Same issue applies to the new strands base-template cache (src/assets/python/http/strands/base/main.py ~line 433-438), since the PR description there also describes it as "LRU-style eviction".
(Separate, smaller concern on this template: get_or_create_agent isn't concurrency-safe — two concurrent first-invocations for the same new session_id will both fall through the if session_id not in _agents check, both await get_streamable_http_mcp_tools(), and both build an AssistantAgent; the loser's agent is discarded mid-flight. Not catastrophic, but worth an asyncio.Lock around the create path if you want it tight.)
| def get_or_create_agent(session_id{{#if hasSkillsFetcher}}, skill_plugins=None{{/if}}): | ||
| if session_id not in cache: | ||
| if len(cache) >= 128: | ||
| cache.pop(next(iter(cache))) |
There was a problem hiding this comment.
FIFO, not LRU — same as the autogen template. next(iter(cache)) returns the oldest-inserted key, and reading cache[session_id] doesn't update its position. Either change the comment to say "FIFO eviction (oldest by insertion order)" or switch to collections.OrderedDict + cache.move_to_end(session_id) on hit for real LRU behavior.
This affects the no-memory branch only — the hasMemory branch (lines ~385-423) is unchanged here and was already unbounded; you may want to consider applying a cap there too for consistency, but that's out of scope of #808/#809 so feel free to defer.
| # Caches up to 128 active sessions; LRU eviction silently resets history for | ||
| # the oldest session. For production use, replace with a durable session store | ||
| # (e.g. SQLiteSession with a file path). | ||
| @lru_cache(maxsize=128) |
There was a problem hiding this comment.
Not blocking, just confirming: SQLiteSession(session_id) with no db_path uses an in-memory SQLite DB per-instance, so the lru_cache retention semantics here are: "keep up to 128 in-memory SQLite DBs alive; evicted DBs are GC'd and their history is lost on next access." That matches the docstring and is the correct knob for a starter template. ✅
Unlike the autogen/strands dict-based caches, functools.lru_cache is true LRU — accessing an existing entry moves it to most-recently-used — so no FIFO concern here.
… caches true LRU (aws#808, aws#809) The googleadk (InMemorySessionService) and langchain_langgraph (InMemorySaver) templates kept per-session state at module scope with no eviction, trading the cross-session leak for an unbounded memory leak in long-running processes. Bound both to 128 active sessions with LRU eviction (delete_session / delete_thread on the least-recently-used key), matching the other three templates. The autogen and strands no-memory caches were plain dicts evicting by insertion order (FIFO) despite docstrings claiming LRU. Switch them to OrderedDict with move_to_end on hit so the eviction policy is genuinely LRU and the docstrings are accurate. openaiagents already uses functools.lru_cache (true LRU) and is unchanged. Regenerated the assets snapshot. Refs aws#808 Refs aws#809
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Looks good to merge — all the issues raised in the prior review have been addressed in commit 08bfb25:
- googleadk:
_session_serviceis now bounded to 128 active sessions via_session_keysOrderedDict with LRU eviction callingdelete_sessionon the evicted entry. ✅ - langchain_langgraph:
_checkpointeris now bounded to 128 active threads via_thread_idsOrderedDict with LRU eviction callingdelete_thread. ✅ - autogen: Switched from plain
dicttoOrderedDictwithmove_to_endon hit — now true LRU rather than FIFO. ✅ - strands: Same OrderedDict +
move_to_endtreatment for the no-memory cache. ✅ - openaiagents: Unchanged from the prior review —
functools.lru_cachewas already correct.
All five HTTP templates are now consistent: a 128-entry per-process cap, true LRU eviction, and context.session_id threaded through. The snapshot is regenerated to match.
A couple of minor things I noticed but won't block on (carried over from the prior review or out of scope):
- The previously-noted concurrency window in autogen's
get_or_create_agent(two concurrent first-invocations for the same newsession_idboth falling through the miss check) still exists, and the same pattern is now present in strands and googleadk. For a starter template this seems acceptable. - The strands
hasMemorybranch (lines ~387-425) is still an unbounded plain-dictcache, but as the prior reviewer noted that's outside the #808/#809 scope.
No new serious issues found.
aws#808) OrderedDict.popitem() returns (key, value) where key is the (user_id, session_id) tuple and value is True. The previous unpack assigned the key tuple to old_user_id and True to old_session_id, so delete_session() was called with garbage and the real session was never deleted -- the InMemorySessionService grew unbounded past the 128-session cap, defeating the aws#808 bound for GoogleADK (LangChain unpacked correctly). Destructure the key tuple and discard the value so the real ids reach delete_session. Add an asset test asserting the store caps at 128 with correct eviction ids, and regenerate the asset snapshot.
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Re-reviewed the new commit f3dabe9 (added after the prior approval). Looks good to merge.
f3dabe9 is a real fix. Confirmed by reading the rendered template at src/assets/python/http/googleadk/base/main.py:
- The key stored in
_session_keysis a 2-tuple(user_id, session_id)(line 150), with the value alwaysTrue. OrderedDict.popitem(last=False)returns(key, value), so the outer 2-tuple is(tuple_key, True).- The previous form
old_user_id, old_session_id = popitem(...)unpacked the outer pair, assigning the whole inner key tuple toold_user_idandTruetoold_session_id—delete_session(user_id=<tuple>, session_id=True)would have errored or no-op'd, leaving the underlyingInMemorySessionServiceto grow past 128 unbounded. The new form(old_user_id, old_session_id), _ = popitem(...)destructures correctly. - LangChain (line 75:
evicted, _ = _thread_ids.popitem(last=False)) uses a single-string key, so its unpack was already correct — the commit message's parenthetical is accurate. ✅ - The other LRU caches (autogen line 107, strands line 441) just call
cache.popitem(last=False)and discard the result, which is fine since they don't need an async cleanup callback likedelete_session. ✅
Tests: The new googleadk-session-eviction.test.ts has two effective regression guards — the substring assertion for the fixed form and the negative-match guard against the buggy form (lines 59 and 61) — which is exactly what's needed to prevent this from regressing. The simulateEviction portion (lines 27-51, 68-85) is a JS re-implementation of the Python eviction algorithm, so the caps the store at 128 assertion isn't actually exercising the template; it's documenting the intended algorithm. Not blocking, but if you wanted that assertion to mean something, you'd need to actually run the rendered template under Python — not worth the infra for a starter-template asset test, the two string guards are sufficient.
No new serious issues. The unaddressed-but-out-of-scope items from the prior review (concurrency window in get_or_create_agent, unbounded hasMemory strands branch) still stand and are fine to defer.
Fixes #808
Fixes #809
Problem
The HTTP starter templates rebuild agent/session state on every invocation, so within one session there is no multi-turn recall (turn 2 can't see turn 1) — directly contradicting
agentcore invoke --session-id. The Strands no-memory template is worse: it shares one global_agentacross all invocations in a process, so conversation history bleeds between sessions/users inagentcore devand non-Runtime deployments (Lambda/ECS/FastAPI). Affects strands, openaiagents, googleadk, langchain_langgraph, and autogen.Fix
Each template now keys in-process agent/session state by
context.session_id, bounded to 128 sessions with LRU eviction so a long-running process keeps per-session isolation without growing unbounded:OrderedDictcache (move_to_endon hit, evict least-recently-used at the cap).@lru_cache(maxsize=128)over per-sessionSQLiteSession.InMemorySessionServicebounded via an LRU key set, evicting viadelete_session.InMemorySaverbounded via an LRU thread set, evicting viadelete_thread.In-memory is best-effort (resets on cold start); for durable history, attach a session manager / AgentCore Memory.
Validation
Bug-bashed with real deploy + invoke per framework on a test account: per-session recall works and there is no cross-session leak (the #809 check) on all frameworks. Generated templates
py_compilecleanly across flag combinations; asset snapshot regenerated; unit suite green.