Skip to content

Conversation

@jealous
Copy link
Contributor

@jealous jealous commented Jan 14, 2026

Purpose of the change

fixes performance degradation in short-term memory caused by blocking summary generation during episode addition.

Description

Previously, triggering a new summary while a previous one was still running would block the add operation and slow down ingestion.

The fix extracts summary logic into a dedicated consolidator class that runs consolidation asynchronously and avoids blocking writes.

Key changes:

  • Move episode summarization logic into a new consolidator class
  • Run consolidation in a background routine instead of blocking add operations
  • Cache newly added episodes when consolidation is already running and include them in the next consolidation pass
  • Ensure queries wait for consolidation to finish so results are always stable and consistent

Fixes/Closes

Fixes #909

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g., code style improvements, linting)
  • Documentation update
  • Project Maintenance (updates to build scripts, CI, etc., that do not affect the main project)
  • Security (improves security without changing functionality)

How Has This Been Tested?

  • Unit Test
  • Integration Test
  • End-to-end Test
  • Test Script (please provide)
  • Manual verification (list step-by-step instructions)

Checklist

  • I have signed the commit(s) within this pull request
  • My code follows the style guidelines of this project (See STYLE_GUIDE.md)
  • I have performed a self-review of my own code
  • I have commented my code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • Confirmed all checks passed
  • Contributor has signed the commit(s)
  • Reviewed the code
  • Run, Tested, and Verified the change(s) work as expected

@jealous jealous requested a review from Copilot January 14, 2026 23:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves async summary performance in short-term memory by extracting summarization logic into a dedicated consolidator class that runs asynchronously, preventing blocking during episode addition.

Changes:

  • Introduced ShortTermMemoryConsolidator class to handle summarization in the background
  • Modified ShortTermMemory to use read-write locks and delegate summarization to the consolidator
  • Extracted common episode formatting logic into episodes_to_string utility function

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/memmachine/episodic_memory/short_term_memory/short_term_memory.py Core refactoring: added consolidator class, converted to async properties, and implemented non-blocking summarization with read-write locks
src/memmachine/common/episode_store/episode_model.py Added episodes_to_string utility function to centralize episode formatting logic
src/memmachine/episodic_memory/episodic_memory.py Updated to use the new episodes_to_string utility instead of duplicated formatting code
src/memmachine/common/errors.py Added ShortTermMemoryClosedError exception for better error handling
tests/memmachine/episodic_memory/short_term_memory/test_short_term_memory.py Added comprehensive tests for new async summarization behavior and updated existing tests to match new summary format
tests/memmachine/common/episode_store/test_episode_model.py Added tests for the new episodes_to_string utility function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jealous jealous self-assigned this Jan 15, 2026
@jealous jealous added the performance Issues relating to MemMachine performance label Jan 15, 2026
@edwinyyyu
Copy link
Contributor

Did you manually verify the performance improvement?

@jealous
Copy link
Contributor Author

jealous commented Jan 15, 2026

Here is my test code, insert 100 entires each has 1000 random characters to episodic memory only:

def run():
    client = MemMachineClient(base_url="http://localhost:8080")
    project = client.get_project(
        org_id="performance_test",
        project_id="multi_add",
    )
    memory = project.memory(
        metadata={
            "group_id": "test_group",
            "agent_id": "agent1",
            "user_id": "user1",
        }
    )
    start = time.time()
    characters = string.ascii_letters + string.digits + string.punctuation
    for i in range(100):
        result = memory.add(
            content=''.join(random.choices(characters, k=1000)),
            memory_types=[MemoryType.Episodic],
        )
        assert len(result) == 1
        if i % 10 == 0:
            print(f"Added entry i={i}")
    duration = time.time() - start
    print(f"Added 1000 entries in {duration:.2f} seconds ({1000/duration:.2f} ops/sec)")

I run the test on my macbook.

Before the change:

Added 1000 entries in 153.24 seconds (6.53 ops/sec)

After the update:

Added 1000 entries in 31.84 seconds (31.40 ops/sec)

About 5 times faster.

@sscargal
Copy link
Contributor

sscargal commented Jan 16, 2026

@jealous @edwinyyyu This looks extremely promising!!

I used the Locust load script from #837 and ran the Episodic Write-only test to produce the following results:

image

Until this PR, I could only achieve up to ~20 requests per second (RPS) with 50 users - see the results in #837. With this fix, I'm getting up to 150 RPS at 50 users (7.5x increase). The RPS looks stable, which is a good sign.

The Locust workload starts to fail when we hit 100 users with "neo4j - Failed to read four byte Bolt handshake response from server ResolvedIPv4Address(('172.18.0.3', 7687)) (deadline Deadline(timeout=60.0))". This is not the fault of MemMachine or this PR; it's actually a good sign, as it shows the bottleneck is now in Neo4j. See #940 for similar issues. We have tunables available to help, so I'll continue that effort to see if I can successfully complete the load test with 500 users.

In the future, we may need to consider a back-off algorithm when we detect database/storage latency issues or timeouts.

# No more episodes to process, exit the worker
break

batch_to_process = self._pending_episodes[:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is the episodes can accumulate. Eventually it can exceed the model context window size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the logic so it will recursively split the episodes into 2 parts until they fit into the context window. If single episode + summary has already exceeds the context window, I have no other choice but to drop that episode.

fixes performance degradation in short-term memory caused by blocking
summary generation during episode addition. Previously, triggering a
new summary while a previous one was still running would block the add
operation and slow down ingestion.

The fix extracts summary logic into a dedicated consolidator class that
runs consolidation asynchronously and avoids blocking writes.

Key changes:

- Move episode summarization logic into a new consolidator class
- Run consolidation in a background routine instead of blocking add
  operations
- Cache newly added episodes when consolidation is already running and
  include them in the next consolidation pass
- Ensure queries wait for consolidation to finish so results are always
  stable and consistent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Issues relating to MemMachine performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Improve Asynchronous Summary Processing in ShortTermMemory.add_episodes

6 participants