Skip to content

fix: swap AgentDBBackend for RvfBackend + 4 packaging fixes (ADR-0059)#1527

Closed
sparkling wants to merge 178 commits intoruvnet:mainfrom
sparkling:fix/adr-0059-rvf-backend-swap
Closed

fix: swap AgentDBBackend for RvfBackend + 4 packaging fixes (ADR-0059)#1527
sparkling wants to merge 178 commits intoruvnet:mainfrom
sparkling:fix/adr-0059-rvf-backend-swap

Conversation

@sparkling
Copy link
Copy Markdown

Issue

Fixes #1526

Summary

  • Phase 1: Swap AgentDBBackendRvfBackend in createBackend() — data persists to .rvf file (same package, atomic persist, zero native deps)
  • Phase 2: Fix CJS bugs — ID collision (index suffix), ML-006 scope (current project only), ranked dedup (deduped not store), tool_input snake_case
  • Packaging: @claude-flow/memory promoted from optional to required in CLI; better-sqlite3 pinned to 11.10.0 (confirmed prebuilts for darwin-arm64/x64, linux-x64)
  • ESM fix: github-safe.js renamed to .mjs (ESM syntax with .js extension fails node -c)

Root Cause

AgentDBBackend.initialize() does import('@claude-flow/agentdb') — a cross-package dynamic import that fails silently in the hook subprocess. Data writes to an in-memory Map, lost on process exit. The .rvf file is never created. The session-boundary drain (ADR-048) has never worked.

Changes (6 files)

File Change
cli/.claude/helpers/auto-memory-hook.mjs createBackend(): RvfBackend → AgentDBBackend → JsonFileBackend fallback chain
cli/.claude/helpers/intelligence.cjs ID collision fix, ML-006 project scope, ranked dedup
cli/.claude/helpers/hook-handler.cjs tool_input (snake_case) before toolInput (camelCase)
cli/package.json @claude-flow/memory: optionalDependencies → dependencies
memory/package.json better-sqlite3: optionalDependencies → dependencies, pinned 11.10.0
cli/.claude/helpers/github-safe.js.mjs ESM extension fix

Test Plan

  • Acceptance tests: 12/12 ADR-0059 checks pass (memory, storage, learning, retrieval, hooks, integrity)
  • hook-edit check confirms tool_input.file_path recorded correctly (not "unknown")
  • mem-roundtrip check confirms store→list round-trip works
  • Unit tests: 539/539 pass
  • npm run deploy passes full pipeline

Backward Compatibility

  • Existing .swarm/memory.db files untouched (CLI store path unchanged)
  • AgentDBBackend preserved as fallback for environments where RvfBackend is not exported
  • JsonFileBackend preserved as last-resort fallback

🤖 Generated with claude-flow

sparkling and others added 30 commits March 13, 2026 22:17
CLI's in-memory-repositories.ts imported from ../../../swarm/ using
relative paths, violating tsc rootDir constraint and preventing all
dist/ output (0 JS files emitted for the entire CLI package).

Fix: copy the 4 swarm type files (agent, task, and their repository
interfaces) into cli/src/infrastructure/_swarm-types/ and rewrite
imports to use local paths.

Co-Authored-By: claude-flow <ruv@ruv.net>
6 packages fixed for full type-check (ADR-0028):

- codex: src/types/fs-extra.d.ts
- mcp: src/types/express.d.ts, src/types/cors.d.ts
- shared: src/types/express.d.ts + fix ZodError.errors -> .issues
- performance: src/types/ruvector-attention.d.ts (PascalCase classes + computeRaw)
- embeddings: src/types/agentic-flow-embeddings.d.ts (getNeuralSubstrate, downloadModel)
- testing: src/types/vitest.d.ts (vi, Mock, describe, it, expect)

Co-Authored-By: claude-flow <ruv@ruv.net>
SG-004: Fix CLI cross-package imports
TS-001: Add type declarations for untyped dependencies
MCP servers should auto-start by default. The autoStart: false default
prevented MCP tools from being available without manual intervention.

Fix: set default to true in types.ts, omit autoStart property from
.mcp.json output (absence = auto-start enabled).

Co-Authored-By: claude-flow <ruv@ruv.net>
MC-001: Remove autoStart: false from MCP config
Package is type: module so tsc emits .js, not .cjs.

Co-Authored-By: claude-flow <ruv@ruv.net>
GB-001: Fix gastown-bridge main field
Hash fallback embeddings produce similarity ~0.05-0.28 (not semantic).
The hardcoded 0.3 threshold filtered out 90% of results, causing silent
empty search results when ONNX is unavailable.

Fix: detect embedding model at runtime and apply appropriate threshold:
  - ONNX (MiniLM-L6): 0.3 (meaningful similarity scores)
  - Hash fallback: 0.05 (permissive, ranking is noise)

Changed files:
  - memory-initializer.ts: getAdaptiveThreshold() helper, searchEntries()
  - memory-bridge.ts: _getAdaptiveThreshold() with cached model detection,
    bridgeSearchEntries(), bridgeSemanticSearch(), bridgeSearchPatterns(),
    bridgeLoadSessionPatterns()

Co-Authored-By: claude-flow <ruv@ruv.net>
FB-004: Adaptive search threshold for hash vs ONNX embeddings
SG-004 copied 4 swarm entity files (agent.ts, task.ts, and their
repository interfaces) into CLI. These 833 lines would silently
drift when upstream changes swarm types.

Replace with swarm-interfaces.ts: 100 lines of minimal interfaces
matching only the properties CLI actually uses (id, name, status,
role, domain, getUtilization, etc). No drift risk — if swarm adds
new fields, they only need adding here when CLI needs them.

Co-Authored-By: claude-flow <ruv@ruv.net>
SG-004-v2: Replace copied swarm files with minimal interfaces
Vector indexes were hardcoded to 768-dim but ONNX MiniLM-L6 produces
384-dim embeddings. HNSW index could never be built — all searches
fell back to brute-force. Neural training was capped at 256-dim,
preventing neural patterns from sharing the memory index.

Fixes:
- vector_indexes schema: 768 → 384 (matches MiniLM-L6 output)
- reasoningbank/agentic-flow fallback dimensions: 768 → 384
- neural --dim max/default: 256 → 384 (same dimension as memory)

Result: all embedding paths produce 384-dim vectors that match the
HNSW index configuration. HNSW can now be built and used.

Co-Authored-By: claude-flow <ruv@ruv.net>
DM-001: Fix dimension mismatch — align all embeddings to 384
)

Supersedes PR #21 (which set 384-dim). The target machine has 32 cores
+ 187GB RAM — can easily run all-mpnet-base-v2 (110M params, ~5ms/embed).

Changes:
- ONNX model: all-MiniLM-L6-v2 (384-dim) → all-mpnet-base-v2 (768-dim)
- Vector indexes: 384 → 768 (now matches model output)
- Neural --dim: max/default 384 → 768 (shares same HNSW index)
- All dimension fallbacks aligned to 768
- MCP tools: default model updated, MiniLM kept as enum option

All embedding paths now produce 768-dim vectors matching the HNSW index.

Co-Authored-By: claude-flow <ruv@ruv.net>
ONNX embeddings never loaded on fresh installs because @xenova/transformers
was dynamically imported but not declared as a dependency. Every user got
128-dim hash fallback embeddings that couldn't be inserted into the
768-dim HNSW index.

Fixes:
- Add @xenova/transformers ^2.17.0 as optionalDependency
- Hash fallback dimensions: 128 → 768 (matches HNSW index)
- Optional because: ~100MB download, may fail on some platforms

With ONNX available: all-mpnet-base-v2 produces 768-dim semantic vectors.
Without ONNX: hash fallback produces 768-dim non-semantic vectors (same
dimension, HNSW works, but FB-004 adaptive threshold applies).

Co-Authored-By: claude-flow <ruv@ruv.net>
@ruvector/core provides VectorDb for persistent HNSW vector indexing
(150x-12,500x faster than brute-force). Was dynamically imported but
not declared as a dependency, so HNSW never initialized on fresh
installs.

Added as optionalDependency because: native WASM bindings may fail
on some platforms. Without it, search falls back to O(n) cosine
similarity (still functional, just slower).

Co-Authored-By: claude-flow <ruv@ruv.net>
…6, WM-108: Port 10 runtime patches to TypeScript source

- HW-001: Change stdin from 'pipe' to 'ignore' in headless spawn to prevent hang
- HW-002: Propagate non-zero exit codes from headless workers instead of swallowing
- HW-003: Reduce aggressive intervals (audit 30m, optimize 60m, testgaps 60m) + settings-driven config
- HW-004: Raise worker timeout from 5min to 16min (above max headless 15min)
- DM-001: Fix always-empty daemon.log by using ESM appendFileSync instead of require('fs')
- DM-002: Raise maxCpuLoad from 2.0 to 28.0 for multi-core servers
- DM-003: Skip freemem check on macOS where os.freemem() reports near-zero
- DM-004: Add missing worker types to defaults + real preload/consolidation implementations
- DM-006: Add log rotation (7-day/500-file cleanup) to headless executor
- WM-108: Reduce consolidation interval to 10min, add bridge consolidation + shutdownBridge on stop

Co-Authored-By: claude-flow <ruv@ruv.net>
Config get/export now reads .claude-flow/config.json from disk and merges
with defaults, instead of returning hardcoded values.

Co-Authored-By: claude-flow <ruv@ruv.net>
Adds checkMemoryBackend() to doctor command that checks native dependency
availability for configured memory backend, with --install auto-rebuild.

Co-Authored-By: claude-flow <ruv@ruv.net>
…l, remove v3Mode

CF-006: Replace YAML parser with config.json reader, check config.json for init.
SG-005: Add 'start all' subcommand for memory+daemon+swarm+MCP.
SG-009: Remove v3Mode from swarm_init call.

Co-Authored-By: claude-flow <ruv@ruv.net>
Co-Authored-By: claude-flow <ruv@ruv.net>
Replace all config.yaml references with config.json in isInitialized(),
display strings, and JSON output paths.

Co-Authored-By: claude-flow <ruv@ruv.net>
Remove --v3-mode option, v3Mode variable, and conditional blocks.
Default topology changed to hierarchical-mesh. Flash Attention/AgentDB/SONA
lines always shown. V3 Mode row removed from status table.

Co-Authored-By: claude-flow <ruv@ruv.net>
Co-Authored-By: claude-flow <ruv@ruv.net>
Patches applied:
- HK-002: fail-loud bridge errors in postEdit/postCommand/postTask (#18)
- HK-003: real metrics from sona-patterns/intelligence files (#19)
- HK-004: respect daemon.autoStart from settings.json (#20)
- HK-005: cross-process daemon PID-file guard (#21)
- NS-001: search/list default namespace 'default' → 'all' (#22)
- NS-002: require explicit namespace for store/delete/retrieve (#23)
- NS-003: fix 'pattern' → 'patterns' namespace typo (#24)
- WM-103: MetadataFilter + MMR diversity in search pipeline (#25)
- WM-104: CausalRecall integration in routing (#26)
- WM-105: MemoryGraph importance scoring in store/search (#27)
- WM-106: activate LearningBridge in intelligence_learn (#28)
- WM-107: fix falsy-OR quality bug, fake session ID/stats (#29)
- WM-114: wire AttentionService across 4 controllers (ruvnet#30)

Co-Authored-By: claude-flow <ruv@ruv.net>
EM-001: Read embedding model name and dimensions from .claude-flow/embeddings.json
instead of hardcoding all-mpnet-base-v2. Add forceRebuild support with stale file
cleanup and metadata/early-return guards.

GV-001: After SQLite soft-delete, remove entry from persisted hnsw.metadata.json
and in-memory HNSW map to prevent ghost vectors in search results.

WM-102d/e: Fix SQL schema vector_indexes dimensions from 768 to 384 to match
bridge dimension (ADR-073 Gap F).

Co-Authored-By: claude-flow <ruv@ruv.net>
…s exit

CacheManager cleanup timer and SqljsBackend persist timer now call .unref()
so Node.js can exit cleanly without waiting for timer expiry.

Co-Authored-By: claude-flow <ruv@ruv.net>
WM-102: Wire .claude-flow/config.json into ControllerRegistry init. Replaces
hardcoded learningBridge:false with config-driven values. Adds neural.enabled
gate to skip bridge init when explicitly disabled.

WM-111: Enable EnhancedEmbeddingService in controllers block.

WM-115: Instantiate WASMVectorSearch with JS fallback after registry init.
Expose wasmComputeSimilarity helper for cosine similarity with 4x loop unrolling.

Co-Authored-By: claude-flow <ruv@ruv.net>
WM-116a: Replace null placeholder with dynamic import of agent-memory-scope.js
and call to createAgentBridge factory.

WM-116b: Auto-enable agentMemoryScope when backend is available instead of
requiring explicit opt-in.

Co-Authored-By: claude-flow <ruv@ruv.net>
sparkling and others added 22 commits March 22, 2026 12:29
8 runtime defaults → EMBEDDING_DIM (from embedding-constants.ts)
5 JSDoc comments updated to reference nomic/768

These were copy-pasted from OpenAI tutorials by the upstream author.
The project has never used OpenAI embeddings.

Co-Authored-By: claude-flow <ruv@ruv.net>
…ing-predict, experience-record)

Brings total from 36 to 41 MCP tools. All use existing backend
methods via bridge controller access pattern.

Fixes sparkling/ruflo-patch#13

Co-Authored-By: claude-flow <ruv@ruv.net>
skill-create → skill_create, skill-search → skill_search,
learner-run → learner_run, learning-predict → learning_predict,
experience-record → experience_record

Matches upstream agentdb MCP server naming (skill_create, skill_search).

Co-Authored-By: claude-flow <ruv@ruv.net>
14 hyphenated tool names → underscores to match upstream agentdb
MCP server convention. All 41 tools now use consistent naming:
agentdb_{feature}_{action} (e.g., agentdb_reflexion_store).

Co-Authored-By: claude-flow <ruv@ruv.net>
Key upstream changes:
- P0: daemon startup, ESM controller-registry, memory-bridge fixes
- P1: ReasoningBank, SQLite path, namespace, init hooks fixes
- Security: audit remediation, terminal_execute, agent results fixes
- feat: autopilot persistent completion (ADR-072), guidance MCP tools
- feat: 22 stub CLI commands implemented with real functionality
- fix: hive-mind real agent state, MCP self-kill prevention
- fix: CPU-proportional daemon maxCpuLoad, statusline generator
- fix: ESM/CJS interop, attention class wrappers, semantic routing

Co-Authored-By: claude-flow <ruv@ruv.net>
- config.ts: use getConfig() (getAll/flatten don't exist on ConfigFileManager)
- memory-initializer.ts: remove duplicate entryId declaration from merge
- optional-modules.d.ts: add forward_count/reset_scope to WasmScopedLoRA type
- controller-registry.ts: add eagerMaxLevel to RuntimeConfig (ADR-0048)

Co-Authored-By: claude-flow <ruv@ruv.net>
causalRecall had a factory (line 1155) but was never auto-initialized
because it was missing from the INIT_LEVELS array. Depends on causalGraph
(level 4) and explainableRecall (level 3), so level 4 is correct.

Co-Authored-By: claude-flow <ruv@ruv.net>
…+ hook signals

claudemd-generator.ts generated CLAUDE.md files referencing a "Task tool"
that was renamed to "Agent" in Claude Code v2.1.63 (anthropics/claude-code#29677).
14 occurrences across 5 functions produced instructions for a nonexistent tool.

Changes:
- Replace swarmOrchestration() with agentOrchestration() — correct tool names
- Rewrite concurrencyRules() — "Agent tool", remove TodoWrite reference
- Add mcpToolDiscovery() — ToolSearch bootstrap for 200+ deferred MCP tools
- Add hookSignals() — bind [INTELLIGENCE] and [INFO] to concrete actions
- Add whenToUseWhat() — 5-row decision tree (Agent/MCP/CLI/Skill)
- Rewrite setupAndBoundary() — remove dead "Task tool" boundary rules
- Rewire all 6 template compositions — standard drops from ~250 to ~90 lines

Fixes sparkling/ruflo-patch#92
Upstream refs: ruvnet#1497, ruvnet#1476, ruvnet#1413

Co-Authored-By: claude-flow <ruv@ruv.net>
…ve Project Config

Additions from cross-repo CLAUDE.md analysis (ruflo, agentic-flow, ruvector):

- Add Task Complexity section: when to spawn agents vs work directly
- Add Feature Workflow checklist (TDD loop): test → implement → verify → commit
- Add single-test run command to Build & Test block
- Rewrite Hook Signals with before/during/after lifecycle structure
- Remove Project Config sub-section (topology/HNSW/neural) — this is daemon
  runtime config, not actionable Claude instructions
- Rename projectArchitecture param to _options (no longer consumed)

Follows up on sparkling/ruflo-patch#92 (CM-001)

Co-Authored-By: claude-flow <ruv@ruv.net>
cacheSize: 2048 → 256 MB (was tuned for 187GB server)
maxNodes: 50000 → 5000 (was tuned for 32-thread Ryzen)
sonaMode: instant/real-time → balanced (safe default for any machine)

These values were set in commits 5beffbd, d5ae8d5, e5ec5e0 for a
dedicated 187GB/32-thread Hetzner server. They are wasteful on smaller
machines (e.g. 36GB MacBook) and should be overridden via config.json
for large servers, not hardcoded as defaults for all users.

Config.json values from init take precedence — these are only fallbacks
when config is missing or keys are absent.

Fixes sparkling/ruflo-patch#92

Co-Authored-By: claude-flow <ruv@ruv.net>
controller-registry.ts: CEILING was 160GB (set for 187GB Hetzner server).
Now uses 75% of os.totalmem() with a 4GB floor — works on any machine.

config-adapter.ts: cacheSize fallback 1M → 100K entries (v2 compat path).

Co-Authored-By: claude-flow <ruv@ruv.net>
ControllerRegistry init opens 44 controllers + SQLite handles which keeps
the node event loop alive. When getBridge hangs, generateEmbedding never
reaches the direct ONNX path, falling back to hash pseudo-embeddings.

Add Promise.race with 5s timeout so bridge falls through to direct
xenova/transformers loading if the registry is slow.

Co-Authored-By: claude-flow <ruv@ruv.net>
… null AgentDB embedder

Root cause: bridgeGenerateEmbedding called registry.getAgentDB().embedder
which returns null — the AgentDB instance does not expose an embedder.
Meanwhile bridgeEmbed uses registry.get('enhancedEmbeddingService') which
works and produces 768-dim embeddings.

Fix: try bridgeEmbed (enhancedEmbeddingService) first, fall back to
AgentDB embedder path only if that fails.

This was causing all memory store/search operations to silently use
FNV-1a hash pseudo-embeddings instead of real ONNX semantic vectors.

Co-Authored-By: claude-flow <ruv@ruv.net>
FULL_INIT_OPTIONS.embeddings.model was 'all-mpnet-base-v2' (bare).
DEFAULT_INIT_OPTIONS and CODEX_INIT_OPTIONS used 'all-MiniLM-L6-v2' (bare).

These bare names override getEmbeddingConfig() at executor.ts line 1304
and get written to embeddings.json. EmbeddingService then passes them to
transformers.pipeline() which fails to resolve bare names on HuggingFace
(401 Unauthorized), silently falling back to mock embeddings.

Fix: prefix all bare defaults with Xenova/.

Co-Authored-By: claude-flow <ruv@ruv.net>
embeddings-tools.ts: MCP tool enum and default
hooks-tools.ts: hardcoded model in status output
init.ts: wizard model selection choices

Co-Authored-By: claude-flow <ruv@ruv.net>
…stead of null AgentDB embedder"

This reverts commit 544fb22.
Model names should arrive correctly prefixed from defaults and config.
The runtime prefix was masking bare-name bugs upstream.

Co-Authored-By: claude-flow <ruv@ruv.net>
ControllerRegistry created AgentDB with only dbPath, causing it to fall
back to its own default model (MiniLM 384-dim) instead of using the
configured model from getEmbeddingConfig() (mpnet 768-dim).

The registry already reads getEmbeddingConfig() at line 713-715 but
never passed the result to the AgentDB constructor. Now passes
embeddingModel and dimension so AgentDB loads the correct ONNX model.

Co-Authored-By: claude-flow <ruv@ruv.net>
Values determined by expert hive analysis + 500-entry benchmark on M5 Max:

- cacheSize: 256→384 MB (working set ~25MB, 15x headroom)
- maxNodes: 5000→10000 (64x current graph, years of growth)
- similarityThreshold: 0.65→0.25 (0.65 produced zero edges; real mpnet
  scores are 0.37-0.72 for related content)
- confidenceDecayRate: 0.005→0.0008 (patterns last a workday, not 50min)
- accessBoostAmount: 0.03→0.05 (one access offsets ~3 hours of decay)
- consolidationThreshold: 10→8 (fires once per session)
- pageRankDamping: 0.85→0.82 (better exploration in small graphs)
- learningBatchSize: 128→64 (matches single-developer signal rate)
- learningTickInterval: 15000→10000 (patterns available within one tick)

Updated in: types.ts (FULL_INIT_OPTIONS), executor.ts (config.json
generation), memory-bridge.ts (runtime fallbacks).

Co-Authored-By: claude-flow <ruv@ruv.net>
auto-memory-store.json accumulates duplicate entries (4482 entries with
only 157 unique IDs). buildEdges() processes all duplicates, generating
O(n^2) edges between copies of the same entry. This produced 1.3M edges
(194MB graph-state.json) for just 157 unique nodes.

Fix: deduplicate store by ID before building nodes and edges. Also fix
the cache-hit check to compare nodeCount against unique ID count instead
of raw store length (which never matched due to duplicates).

Result: 194MB -> 79KB (99.96% reduction), 1,337,498 edges -> 157.

Co-Authored-By: claude-flow <ruv@ruv.net>
Fixes ruvnet#1526

Phase 1: RvfBackend preferred in auto-memory-hook createBackend() —
same package, atomic persist, no cross-package import that fails silently.

Phase 2: CJS bug fixes in intelligence.cjs and hook-handler.cjs —
ID collision (index suffix), ML-006 scope (current project only),
ranked dedup (deduped not raw store), tool_input snake_case.

Packaging: @claude-flow/memory promoted from optional to required
in CLI. better-sqlite3 pinned to 11.10.0 (confirmed prebuilts).
github-safe.js renamed to .mjs (ESM syntax fix).

Co-Authored-By: claude-flow <ruv@ruv.net>
@sparkling
Copy link
Copy Markdown
Author

Acceptance Results

All tests pass after this change: 69/69 (was 52/69 before).

ADR-0059 acceptance checks: 12/12 — memory store/retrieve, search, persistence, storage files, intelligence graph, retrieval, insight generation, learning feedback, hook import, hook edit (tool_input snake_case), hook lifecycle, ID collision integrity.

Companion PR: ruvnet/agentic-flow#140 (dotenv fix in agentdb LLMRouter — discovered during same investigation).

@sparkling
Copy link
Copy Markdown
Author

Note on PR scope

This branch carries the full fork divergence (100 commits / 160 files) because the ADR-0059 changes cannot be cleanly cherry-picked onto upstream main. The upstream auto-memory-hook.mjs doesn't have createBackend() or the JSON config reader — those were added in prior fork patches that haven't been merged yet.

Minimum merge path for just the ADR-0059 fix:

  1. Merge fix: deduplicate store entries in intelligence.cjs (194MB → 79KB) #1519 (intelligence.cjs dedup) — creates the deduped variable ADR-0059 references
  2. Then the 4 helper files (auto-memory-hook.mjs, intelligence.cjs, hook-handler.cjs, github-safe.mjs) + 2 package.json changes can be cherry-picked

Alternatively, this PR can be merged as-is — it includes all prior patches (#1512, #1517, #1519) plus ADR-0059.

The ADR-0059 commit is 204cb3cd7 (the last commit on the branch). All other commits are prior fork patches already filed as #1512, #1517, #1519.

@sparkling
Copy link
Copy Markdown
Author

Closing in favour of a clean PR from fix/all-patches-clean (1 commit, 7 files — no version bumps or unrelated changes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: auto-memory hook silently drops all session data + 4 packaging bugs

1 participant