[GH-909] Improve async summary performance. #949

jealous · 2026-01-14T23:07:59Z

Purpose of the change

fixes performance degradation in short-term memory caused by blocking summary generation during episode addition.

Description

Previously, triggering a new summary while a previous one was still running would block the add operation and slow down ingestion.

The fix extracts summary logic into a dedicated consolidator class that runs consolidation asynchronously and avoids blocking writes.

Key changes:

Move episode summarization logic into a new consolidator class
Run consolidation in a background routine instead of blocking add operations
Cache newly added episodes when consolidation is already running and include them in the next consolidation pass
Ensure queries wait for consolidation to finish so results are always stable and consistent

Fixes/Closes

Fixes #909

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactor (does not change functionality, e.g., code style improvements, linting)
Documentation update
Project Maintenance (updates to build scripts, CI, etc., that do not affect the main project)
Security (improves security without changing functionality)

How Has This Been Tested?

Checklist

Maintainer Checklist

Confirmed all checks passed
Contributor has signed the commit(s)
Reviewed the code
Run, Tested, and Verified the change(s) work as expected

Copilot

Pull request overview

This PR improves async summary performance in short-term memory by extracting summarization logic into a dedicated consolidator class that runs asynchronously, preventing blocking during episode addition.

Changes:

Introduced ShortTermMemoryConsolidator class to handle summarization in the background
Modified ShortTermMemory to use read-write locks and delegate summarization to the consolidator
Extracted common episode formatting logic into episodes_to_string utility function

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/memmachine/episodic_memory/short_term_memory/short_term_memory.py`	Core refactoring: added consolidator class, converted to async properties, and implemented non-blocking summarization with read-write locks
`src/memmachine/common/episode_store/episode_model.py`	Added `episodes_to_string` utility function to centralize episode formatting logic
`src/memmachine/episodic_memory/episodic_memory.py`	Updated to use the new `episodes_to_string` utility instead of duplicated formatting code
`src/memmachine/common/errors.py`	Added `ShortTermMemoryClosedError` exception for better error handling
`tests/memmachine/episodic_memory/short_term_memory/test_short_term_memory.py`	Added comprehensive tests for new async summarization behavior and updated existing tests to match new summary format
`tests/memmachine/common/episode_store/test_episode_model.py`	Added tests for the new `episodes_to_string` utility function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/memmachine/episodic_memory/short_term_memory/test_short_term_memory.py

src/memmachine/episodic_memory/short_term_memory/short_term_memory.py

src/memmachine/common/episode_store/episode_model.py

tests/memmachine/common/episode_store/test_episode_model.py

edwinyyyu · 2026-01-15T20:15:24Z

Did you manually verify the performance improvement?

jealous · 2026-01-15T21:54:34Z

Here is my test code, insert 100 entires each has 1000 random characters to episodic memory only:

def run():
    client = MemMachineClient(base_url="http://localhost:8080")
    project = client.get_project(
        org_id="performance_test",
        project_id="multi_add",
    )
    memory = project.memory(
        metadata={
            "group_id": "test_group",
            "agent_id": "agent1",
            "user_id": "user1",
        }
    )
    start = time.time()
    characters = string.ascii_letters + string.digits + string.punctuation
    for i in range(100):
        result = memory.add(
            content=''.join(random.choices(characters, k=1000)),
            memory_types=[MemoryType.Episodic],
        )
        assert len(result) == 1
        if i % 10 == 0:
            print(f"Added entry i={i}")
    duration = time.time() - start
    print(f"Added 1000 entries in {duration:.2f} seconds ({1000/duration:.2f} ops/sec)")

I run the test on my macbook.

Before the change:

Added 1000 entries in 153.24 seconds (6.53 ops/sec)

After the update:

Added 1000 entries in 31.84 seconds (31.40 ops/sec)

About 5 times faster.

sscargal · 2026-01-16T00:18:47Z

@jealous @edwinyyyu This looks extremely promising!!

I used the Locust load script from #837 and ran the Episodic Write-only test to produce the following results:

Until this PR, I could only achieve up to ~20 requests per second (RPS) with 50 users - see the results in #837. With this fix, I'm getting up to 150 RPS at 50 users (7.5x increase). The RPS looks stable, which is a good sign.

The Locust workload starts to fail when we hit 100 users with "neo4j - Failed to read four byte Bolt handshake response from server ResolvedIPv4Address(('172.18.0.3', 7687)) (deadline Deadline(timeout=60.0))". This is not the fault of MemMachine or this PR; it's actually a good sign, as it shows the bottleneck is now in Neo4j. See #940 for similar issues. We have tunables available to help, so I'll continue that effort to see if I can successfully complete the load test with 500 users.

In the future, we may need to consider a back-off algorithm when we detect database/storage latency issues or timeouts.

malatewang · 2026-01-16T18:05:55Z

src/memmachine/episodic_memory/short_term_memory/short_term_memory.py

+                    # No more episodes to process, exit the worker
+                    break
+
+                batch_to_process = self._pending_episodes[:]


The problem here is the episodes can accumulate. Eventually it can exceed the model context window size.

I updated the logic so it will recursively split the episodes into 2 parts until they fit into the context window. If single episode + summary has already exceeds the context window, I have no other choice but to drop that episode.

fixes performance degradation in short-term memory caused by blocking summary generation during episode addition. Previously, triggering a new summary while a previous one was still running would block the add operation and slow down ingestion. The fix extracts summary logic into a dedicated consolidator class that runs consolidation asynchronously and avoids blocking writes. Key changes: - Move episode summarization logic into a new consolidator class - Run consolidation in a background routine instead of blocking add operations - Cache newly added episodes when consolidation is already running and include them in the next consolidation pass - Ensure queries wait for consolidation to finish so results are always stable and consistent

jealous requested a review from Copilot January 14, 2026 23:08

Copilot AI reviewed Jan 14, 2026

View reviewed changes

tests/memmachine/episodic_memory/short_term_memory/test_short_term_memory.py Outdated Show resolved Hide resolved

src/memmachine/episodic_memory/short_term_memory/short_term_memory.py Show resolved Hide resolved

jealous requested review from malatewang, o-love and sscargal January 14, 2026 23:23

jealous force-pushed the bugfix/addMemLock branch from 3c33102 to 08a2d80 Compare January 15, 2026 18:38

jealous mentioned this pull request Jan 15, 2026

Use relative Python imports in spec.py rather than absolute paths #950

Merged

12 tasks

jealous self-assigned this Jan 15, 2026

jealous added the performance Issues relating to MemMachine performance label Jan 15, 2026

edwinyyyu reviewed Jan 15, 2026

View reviewed changes

src/memmachine/common/episode_store/episode_model.py Show resolved Hide resolved

edwinyyyu reviewed Jan 15, 2026

View reviewed changes

tests/memmachine/common/episode_store/test_episode_model.py Outdated Show resolved Hide resolved

jealous force-pushed the bugfix/addMemLock branch from 08a2d80 to 17b2660 Compare January 15, 2026 21:37

mwqgithub approved these changes Jan 16, 2026

View reviewed changes

malatewang reviewed Jan 16, 2026

View reviewed changes

jealous force-pushed the bugfix/addMemLock branch from 17b2660 to 255f6e1 Compare January 16, 2026 20:41

jealous force-pushed the bugfix/addMemLock branch from 255f6e1 to 9e57b54 Compare January 16, 2026 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GH-909] Improve async summary performance. #949

[GH-909] Improve async summary performance. #949

Uh oh!

jealous commented Jan 14, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edwinyyyu commented Jan 15, 2026

Uh oh!

jealous commented Jan 15, 2026

Uh oh!

sscargal commented Jan 16, 2026 •

edited

Loading

Uh oh!

malatewang Jan 16, 2026

Uh oh!

jealous Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[GH-909] Improve async summary performance. #949

Are you sure you want to change the base?

[GH-909] Improve async summary performance. #949

Uh oh!

Conversation

jealous commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of the change

Description

Fixes/Closes

Type of change

How Has This Been Tested?

Checklist

Maintainer Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edwinyyyu commented Jan 15, 2026

Uh oh!

jealous commented Jan 15, 2026

Uh oh!

sscargal commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

malatewang Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

jealous Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jealous commented Jan 14, 2026 •

edited

Loading

sscargal commented Jan 16, 2026 •

edited

Loading