-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or request
Milestone
Description
Summary
MCC currently concatenates WMC turns + EMC episodes without explicit compression or priority ordering. Under heavy use this could exceed Cosmos's 2048 token limit.
Current Behavior
context = [emc_system_message] + wmc_turns
# No compression, no priority ordering, no overflow checkProposed Enhancement
Add explicit context compression in build_context():
# Priority ordering:
# 1. Most recent WMC turns (highest priority)
# 2. Highest-relevance EMC episodes (by similarity score)
# 3. Drop lowest priority if total exceeds 80% of context limit
def build_context(self, user_input: str) -> list[dict]:
wmc_turns = self.wmc.get_event_segments()
emc_results = self.emc.search(user_input, top_k=EMC_TOP_K)
# Budget-aware assembly
total_chunks = 0
context = []
# Add WMC turns newest first until budget
for turn in reversed(wmc_turns):
cost = _estimate_chunks(turn["content"])
if total_chunks + cost > CONTEXT_CHUNK_LIMIT:
break
context.insert(0, turn)
total_chunks += cost
# Add EMC episodes by relevance until budget
for ep in sorted(emc_results, key=lambda x: x["similarity"], reverse=True):
cost = _estimate_chunks(ep["content"])
if total_chunks + cost > CONTEXT_CHUNK_LIMIT:
break
# inject as system message
total_chunks += costImpact
- Context window never overflows
- Most relevant content always included
- Graceful degradation when context is large
Notes
- Related to Bug [BUG] Rough token estimation in WMC may overflow Cosmos 2048 token limit #14 (rough token estimation)
- Implement in M2 alongside better token estimation
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
Projects
Status
Todo