Add smem --sweep-dominated: budgeted eviction of dominated LTIs (draft)#580
Open
kimjune01 wants to merge 12 commits intoSoarGroup:developmentfrom
Open
Add smem --sweep-dominated: budgeted eviction of dominated LTIs (draft)#580kimjune01 wants to merge 12 commits intoSoarGroup:developmentfrom
kimjune01 wants to merge 12 commits intoSoarGroup:developmentfrom
Conversation
Periodically scan episodic memory for stable WME structures and write
them to semantic memory as new LTIs. This implements the compose+test
framework (Casteigts et al., 2019) for automatic episodic-to-semantic
knowledge transfer — the operation Soar's long-term declarative stores
have been missing.
Algorithm:
compose — union of constant WMEs currently active in epmem
test — continuous presence >= consolidate-threshold episodes
write — create smem LTI with qualifying augmentations via CLI_add
New parameters (all under epmem):
consolidate on/off (default off)
consolidate-interval integer (default 100) — episodes between runs
consolidate-threshold integer (default 10) — min episode persistence
Deduplication via epmem_consolidated tracking table prevents repeated
writes across consolidation runs. Table is dropped on reinit alongside
other epmem graph tables.
Off by default — zero behavior change until explicitly enabled.
Limitations (deferred to follow-up):
- Only consolidates constant-valued WMEs, not identifier edges
- No back-invalidation across the WM/smem tier boundary
- last-consolidation stat does not persist across agent reinit
Motivation: Derbinsky & Laird (2013) proved forgetting is essential to
Soar's scaling but only built it for working and procedural memory.
Episodic and semantic memory have no eviction and no capacity bound.
This patch addresses the first half: automatic semantic learning from
episodic experience. With semantic entries derived from episodes,
episodic eviction becomes safe (merged episodes leave no reconstruction
debt), and R4's forgettable WME scope expands automatically.
Reference:
Casteigts et al. (2019), "Computing Parameters of Sequence-Based
Dynamic Graphs," Theory of Computing Systems.
Derbinsky & Laird (2013), "Effective and efficient forgetting of
learned knowledge in Soar's working and procedural memories,"
Cognitive Systems Research.
https://june.kim/prescription-soar — full prescription
After consolidation writes stable WMEs to smem, old episodes become redundant. Delete point entries and episode rows older than consolidate-evict-age episodes. This is safe: the consolidated knowledge is in smem, so there is no reconstruction debt. New parameter: consolidate-evict-age integer (default 0 = off) — min age before an episode is eligible for eviction Range and _now interval entries are preserved (they span multiple episodes). Only point entries and episode rows are removed. Reference: Derbinsky & Laird (2013), §5 — "forgotten working-memory knowledge may be recovered via deliberate reconstruction from semantic memory." Consolidation creates the semantic entries; eviction removes the source episodes that are no longer needed for reconstruction.
- Delete _range entries whose intervals end before the eviction cutoff (previously only _point entries were evicted, leaving dead weight) - Wrap all eviction DELETEs in BEGIN/COMMIT when lazy_commit is off for atomicity (when lazy_commit is on, already inside a transaction) Retrieval of evicted episodes is already safe: epmem_install_memory checks valid_episode and returns ^retrieved no-memory.
Implements Kilpeläinen-Mannila tree inclusion to detect which LTI entries are structurally dominated by others. Detection only — no eviction. Works at the raw hash level via web_expand to avoid Symbol allocation overhead.
…rror routing Codex review found four issues: 1. Greedy child matching could produce false negatives when first-fit blocks later required matches. Replaced with backtracking injective matcher. 2. Cycle detection conflated "currently exploring" with "proven to include". Split into separate active_pairs (recursion stack) and memo (proven results). Cycles now conservatively return false. 3. CLI_redundancy_check returned void, DoSMem always returned true. Changed to bool with SetError routing on failure. 4. help smem did not list --redundancy-check. Added.
Round 2 codex review found two remaining issues: 1. Per-attribute a_used vectors allowed two distinct B nodes to map to the same A node via different attributes. Fixed by threading a global b_to_a map (B node → A node assignment) through all recursion. Backtracking undoes assignments on failure. 2. Active-pair cycle check returned false (pessimistic), which missed valid cyclic equivalences like @1 ^next @1 vs @2 ^next @2. Changed to coinductive (optimistic): revisiting an active pair returns true. If the assumption is wrong, non-cyclic proof obligations will fail.
Round 3 codex review found two issues: 1. Failed speculative branches leaked descendant b_to_a bindings. Fixed by snapshotting b_to_a before each branch and restoring on failure. Also pre-bind b_child before recursion so descendants see the intended assignment. 2. Memoization keyed by (lti_a, lti_b) was unsound because results depend on the current b_to_a context. Dropped memo entirely — smem entries are shallow (depth 1-2) so re-evaluation is cheap.
Round 4 codex review found two issues: 1. Root of B was not pinned to root of A in the global assignment. Counterexample: @b ^next @b vs @A ^next @A1, @A1 ^next @A1 — B's root could map to A1 instead of A. Fixed by seeding b_to_a[lti_b] = lti_a before recursion. 2. smem --redundancy-check was missing from the runtime help screen in smem_settings.cpp. Added.
…ated LTIs Mark phase reuses tree inclusion detection from SoarGroup#579. Sweep phase: 1. R4 safety check: skip LTIs currently referenced in working memory 2. Dependency-safe deletion: disconnect_ltm, then delete from all smem tables (augmentations, activation history, aliases, lti) 3. Per-invocation budget via optional numeric argument Includes functional test proving full eviction: add 3 LTIs where @1 is structurally dominated by @2, sweep, verify @1 is gone and @2/@3 survive. Post-sweep redundancy check confirms no remaining dominated entries.
Replaces raw SQL deletion in sweep with a proper kernel-side routine that composes existing bookkeeping paths: 1. disconnect_ltm for outgoing edges (existing) 2. Inbound edge update: for each parent pointing to this LTI, decrement child counts, LTI-child counts, attribute frequencies 3. Invalidate spreading activation via invalidate_from_lti 4. Clean all 8+ auxiliary tables: prohibited, trajectories, likelihoods, trajectory_num, current/uncommitted/committed spread, activation history, aliases, fake activations 5. Delete from smem_lti and update node count Addresses all three correctness blockers from codex round 1 review of the sweep PR.
Round 2 codex review found: remaining_attr_q excluded constant rows because NULL <> ? is NULL in SQLite (not true). A parent with ^name @dead AND ^name alice would incorrectly decrement attribute frequency after deleting @dead. Fixed with (value_lti_id IS NULL OR value_lti_id<>?). Added testSweepDominatedWithInboundRefs: parent @1 has ^name alice (constant) and ^friend @2 (LTI child). @2 is dominated by @3 and swept. Verifies @1 retains ^name after the LTI child is removed.
… test Round 3 codex review found: (1) surviving parents need trajectory invalidation and edge weight renormalization after child removal, (2) NULL-safe attribute frequency check already fixed. Guard spreading invalidation on spreading-enabled check to avoid hangs when spreading tables aren't initialized. Added testSweepDominatedWithInboundRefs: parent with LTI child where child is dominated and swept. Verifies parent survives with constant attributes intact, no redundancy remains after sweep.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
After episodic-to-semantic consolidation (#578), merged entries may structurally subsume older smem entries. Those older entries persist indefinitely — actively retrieved, never evicted, growing the store that spreading activation must traverse every 50ms decision cycle.
PR #579 added detection (
smem --redundancy-check). This PR adds the sweep: actually evicting dominated entries with proper kernel-level bookkeeping.Full analysis: Diagnosis: Soar, SOAP Notes: Soar, Prescription: Soar.
What this PR does
Adds
smem --sweep-dominated [<budget>]and a new kernel routinedelete_ltm():delete_ltm(uint64_t pLTI_ID)— full LTI deletion with proper bookkeeping:disconnect_ltmsmem_ltiand updates node statisticssmem --sweep-dominated [<budget>]— mark-and-sweep eviction:delete_ltm()for each dominated entry (up to budget), in reverse ID orderExpected consequence
Without this PR, consolidation (#578) adds entries but never removes redundant ones — net store growth is monotonically positive. With it, dominated entries are evicted after each consolidation cycle, making net growth potentially negative. Over a 30-day deployment at 72,000 episodes/hour, this is the difference between smem growing without bound and smem staying bounded.
Dependencies
Algorithm
Budgeted output-phase mark-and-sweep (Prescription: Soar):
delete_ltm()with full bookkeepingMethodology
Implementation by Claude Code (Opus 4.6).
delete_ltm()drafted by codex (GPT-5.4), then volleyed to convergence. Volley methodology: june.kim/volley.delete_ltm()kernel routine with full bookkeepingIS NULL OR, added inbound-ref testinvalidate_trajectories+web_update_all_lti_child_edgesper parentKnown limitations
smem_in_wmemonly tracks LTIs as WMEids, notvalues. A leaf child present in WM without outgoing WMEs may not be protected. Documented as TODO.Test plan
testSweepDominated: flat entries, verify eviction and survivaltestSweepDominatedWithInboundRefs: parent with LTI child, child dominated and swept, parent retains constant attributes