Skip to content

Add smem --redundancy-check (tree inclusion)#579

Draft
kimjune01 wants to merge 8 commits intoSoarGroup:developmentfrom
kimjune01:smem-redundancy-check
Draft

Add smem --redundancy-check (tree inclusion)#579
kimjune01 wants to merge 8 commits intoSoarGroup:developmentfrom
kimjune01:smem-redundancy-check

Conversation

@kimjune01
Copy link
Copy Markdown

@kimjune01 kimjune01 commented Mar 27, 2026

Problem

Semantic memory grows without bound. Soar has no mechanism to detect when one smem entry's knowledge is fully contained in another's. After episodic-to-semantic consolidation (#578), merged entries may structurally subsume older entries, but those older entries persist indefinitely — actively retrieved, never evicted, slowing every subsequent spreading activation pass.

Derbinsky & Laird (2013) showed that without forgetting, a robot exceeded the 50ms decision-cycle threshold within an hour. They built forgetting for working memory and procedural memory. Smem and epmem still have none. This PR addresses the detection half of smem maintenance.

Full analysis: Diagnosis: Soar, SOAP Notes: Soar, Prescription: Soar.

What this PR does

Adds smem --redundancy-check: a CLI command that scans semantic memory for structurally dominated entries via tree inclusion. Detection only — no eviction.

LTI A dominates LTI B if B's augmentation graph embeds injectively into A's — every slot in B has a matching slot in A with a superset of values, child LTIs matched recursively.

Expected consequence

After consolidation (#578) merges N episodes into a semantic entry, this check identifies pre-existing smem entries that the merged result now subsumes. Without it, consolidation adds entries but never removes redundant ones — net store growth is monotonically positive. With it, the eviction path (future PR) can remove dominated entries, making net growth per consolidation cycle potentially negative. Over a 30-day deployment at 72,000 episodes/hour, this is the difference between smem growing without bound and smem staying bounded.

Dependencies

PR Role Status
#578 Episodic-to-semantic consolidation (creates the entries that produce redundancy) Merged to this branch
#579 (this) Structural redundancy detection (identifies dominated entries) This PR
Future Eviction of dominated entries via back-invalidation protocol Not yet started

The three form a pipeline: consolidation creates entries → tree inclusion detects redundancy → eviction removes dominated entries. Each is independently useful but the full payoff requires all three.

Theory

Tree inclusion (Kilpeläinen & Mannila, 1995) defines structural containment on rooted labeled trees. This PR adapts it to smem's graph structures with three extensions:

  1. Coinductive cycle handling. Smem graphs may contain cycles. We use coinductive semantics: revisiting an active pair optimistically assumes inclusion; if the assumption is wrong, non-cyclic proof obligations fail and reject it.

  2. Global injective matching. A b_to_a map threaded through all recursion enforces that two distinct B nodes cannot map to the same A node, even via different attributes. Root pinning ensures rooted inclusion.

  3. Backtracking child matching. Full backtracking over candidate assignments per attribute, with transactional b_to_a snapshot/restore on failed branches.

BLA (base-level activation) handles staleness — entries that haven't been accessed decay. Tree inclusion handles redundancy — entries whose knowledge is fully contained in a richer entry. They are orthogonal. See Prescription: Soar § Structural redundancy via tree inclusion.

This fills the "Tree × Dominance" cell in the parts bin taxonomy — structural dominance filtering on tree-shaped data. Background on the taxonomy and the blank cell: The Missing Parts.

Implementation

  • New file: smem_inclusion.cpp (~400 lines)
    • load_lti_augs() — loads augmentations via web_expand into lightweight structs keyed by hash IDs (no Symbol allocation)
    • smem_lti_includes_impl() — recursive inclusion with coinductive cycles, global injectivity, backtracking
    • CLI_redundancy_check() — iterates all LTI pairs, reports dominated entries and equivalence classes
  • CLI: smem --redundancy-check (option 'R', no arguments)
  • No schema changes. Reuses existing web_expand and lti_all infrastructure

Methodology

Implementation by Claude Code (Opus 4.6) in an isolated git worktree. Reviewed through five rounds of adversarial volley with codex (GPT-5.4). Volley methodology: june.kim/volley.

Round Issues found Fixes applied
1 Greedy matching incorrect, cycle handling unsound, CLI returns wrong, help text missing Backtracking matcher, split active_pairs from memo, bool return + SetError, help text
2 Global injectivity not enforced (per-attribute only), cycles still wrong (pessimistic) b_to_a map threaded globally, coinductive cycle semantics
3 Failed branches leak descendant bindings, memoization context-dependent Transactional b_to_a snapshots, dropped memo
4 Root not pinned (non-root-preserving embeddings accepted), runtime help missing b_to_a seeded with root binding, smem_settings help updated
5 No correctness blockers. Ready to merge.

References

  • Kilpeläinen, P. & Mannila, H. (1995). "Ordered and Unordered Tree Inclusion." SIAM J. Computing 24(2):340–356. DOI
  • Bille, P. & Gørtz, I.L. (2011). "The Tree Inclusion Problem: In Linear Space and Faster." ACM TALG 7(3). arXiv
  • Pinter, R.Y. et al. (2007). "Approximate Labelled Subtree Homeomorphism." J. Discrete Algorithms 5(4). DOI
  • Derbinsky, N. & Laird, J.E. (2013). "Effective and Efficient Forgetting of Learned Knowledge in Soar's Working and Procedural Memories." Cognitive Systems Research. DOI
  • Laird, J.E. (2022). "Introduction to the Soar Cognitive Architecture." arXiv

Test plan

  • Build succeeds (CMake + make, macOS)
  • Smoke test: @1 (^name alice ^age 30) dominated by @2 (^name alice ^age 30 ^city boston)
  • Five rounds of adversarial code review (codex GPT-5.4)
  • Test with cyclic structures (@1 ^next @1 vs @2 ^next @2)
  • Test with shared substructure across attributes
  • Run against a real agent's smem store after consolidation (Episodic-to-semantic memory consolidation (experimental) #578)
  • Verify no regression on existing smem unit tests

Periodically scan episodic memory for stable WME structures and write
them to semantic memory as new LTIs.  This implements the compose+test
framework (Casteigts et al., 2019) for automatic episodic-to-semantic
knowledge transfer — the operation Soar's long-term declarative stores
have been missing.

Algorithm:
  compose — union of constant WMEs currently active in epmem
  test    — continuous presence >= consolidate-threshold episodes
  write   — create smem LTI with qualifying augmentations via CLI_add

New parameters (all under epmem):
  consolidate            on/off  (default off)
  consolidate-interval   integer (default 100) — episodes between runs
  consolidate-threshold  integer (default 10)  — min episode persistence

Deduplication via epmem_consolidated tracking table prevents repeated
writes across consolidation runs.  Table is dropped on reinit alongside
other epmem graph tables.

Off by default — zero behavior change until explicitly enabled.

Limitations (deferred to follow-up):
  - Only consolidates constant-valued WMEs, not identifier edges
  - No back-invalidation across the WM/smem tier boundary
  - last-consolidation stat does not persist across agent reinit

Motivation: Derbinsky & Laird (2013) proved forgetting is essential to
Soar's scaling but only built it for working and procedural memory.
Episodic and semantic memory have no eviction and no capacity bound.
This patch addresses the first half: automatic semantic learning from
episodic experience.  With semantic entries derived from episodes,
episodic eviction becomes safe (merged episodes leave no reconstruction
debt), and R4's forgettable WME scope expands automatically.

Reference:
  Casteigts et al. (2019), "Computing Parameters of Sequence-Based
    Dynamic Graphs," Theory of Computing Systems.
  Derbinsky & Laird (2013), "Effective and efficient forgetting of
    learned knowledge in Soar's working and procedural memories,"
    Cognitive Systems Research.
  https://june.kim/prescription-soar — full prescription
After consolidation writes stable WMEs to smem, old episodes become
redundant.  Delete point entries and episode rows older than
consolidate-evict-age episodes.  This is safe: the consolidated
knowledge is in smem, so there is no reconstruction debt.

New parameter:
  consolidate-evict-age  integer (default 0 = off) — min age before
  an episode is eligible for eviction

Range and _now interval entries are preserved (they span multiple
episodes).  Only point entries and episode rows are removed.

Reference: Derbinsky & Laird (2013), §5 — "forgotten working-memory
knowledge may be recovered via deliberate reconstruction from semantic
memory."  Consolidation creates the semantic entries; eviction removes
the source episodes that are no longer needed for reconstruction.
- Delete _range entries whose intervals end before the eviction cutoff
  (previously only _point entries were evicted, leaving dead weight)
- Wrap all eviction DELETEs in BEGIN/COMMIT when lazy_commit is off
  for atomicity (when lazy_commit is on, already inside a transaction)

Retrieval of evicted episodes is already safe: epmem_install_memory
checks valid_episode and returns ^retrieved no-memory.
Implements Kilpeläinen-Mannila tree inclusion to detect which LTI
entries are structurally dominated by others. Detection only — no
eviction. Works at the raw hash level via web_expand to avoid
Symbol allocation overhead.
…rror routing

Codex review found four issues:
1. Greedy child matching could produce false negatives when first-fit
   blocks later required matches. Replaced with backtracking injective
   matcher.
2. Cycle detection conflated "currently exploring" with "proven to
   include". Split into separate active_pairs (recursion stack) and
   memo (proven results). Cycles now conservatively return false.
3. CLI_redundancy_check returned void, DoSMem always returned true.
   Changed to bool with SetError routing on failure.
4. help smem did not list --redundancy-check. Added.
Round 2 codex review found two remaining issues:

1. Per-attribute a_used vectors allowed two distinct B nodes to map to
   the same A node via different attributes. Fixed by threading a global
   b_to_a map (B node → A node assignment) through all recursion.
   Backtracking undoes assignments on failure.

2. Active-pair cycle check returned false (pessimistic), which missed
   valid cyclic equivalences like @1 ^next @1 vs @2 ^next @2. Changed
   to coinductive (optimistic): revisiting an active pair returns true.
   If the assumption is wrong, non-cyclic proof obligations will fail.
Round 3 codex review found two issues:

1. Failed speculative branches leaked descendant b_to_a bindings.
   Fixed by snapshotting b_to_a before each branch and restoring on
   failure. Also pre-bind b_child before recursion so descendants
   see the intended assignment.

2. Memoization keyed by (lti_a, lti_b) was unsound because results
   depend on the current b_to_a context. Dropped memo entirely —
   smem entries are shallow (depth 1-2) so re-evaluation is cheap.
Round 4 codex review found two issues:

1. Root of B was not pinned to root of A in the global assignment.
   Counterexample: @b ^next @b vs @A ^next @A1, @A1 ^next @A1 —
   B's root could map to A1 instead of A. Fixed by seeding
   b_to_a[lti_b] = lti_a before recursion.

2. smem --redundancy-check was missing from the runtime help screen
   in smem_settings.cpp. Added.
@kimjune01 kimjune01 marked this pull request as draft March 27, 2026 16:12
kimjune01 added a commit to kimjune01/Soar that referenced this pull request Mar 27, 2026
…ated LTIs

Mark phase reuses tree inclusion detection from SoarGroup#579. Sweep phase:
1. R4 safety check: skip LTIs currently referenced in working memory
2. Dependency-safe deletion: disconnect_ltm, then delete from all
   smem tables (augmentations, activation history, aliases, lti)
3. Per-invocation budget via optional numeric argument

Includes functional test proving full eviction: add 3 LTIs where
@1 is structurally dominated by @2, sweep, verify @1 is gone and
@2/@3 survive. Post-sweep redundancy check confirms no remaining
dominated entries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant