tune(recall): grid search infra + PARTIAL default changes#408
Conversation
Route pack-memory's fuse_candidates through khive_retrieval::fuse_search_results, making khive-retrieval a real consumed facade instead of an orphan crate. - Add khive-retrieval dep to khive-pack-memory/Cargo.toml - Replace direct fuse_with_strategy call with retrieval adapter (CandidateMeta side-map, HybridConfig builder, FusionStrategy conversion) - Fix issue #309: resolve --all-features compile failures in khive-retrieval (stale SqliteStore imports, missing NodeId/LinkStore imports) - Add 5 integration tests (3 fusion_surface, 2 pack-memory recall adapter) - RRF k=1 discriminator test proves strategy propagation (30x score gap) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…recall (ADR-033 §6)
- Add three optional per-request fields to RecallParams: top_k (usize),
fusion_strategy (string), and score_floor (f32)
- fusion_strategy validated against {"rrf","weighted","union"}; clear error
with valid values on invalid input
- top_k overrides the result limit for a single call (capped at 100)
- score_floor applied as a post-filter on the composite score after compute_score
- Add parse_fusion_strategy_str helper; wire override into cfg.fuse_strategy
before passing to fuse_candidates
- Add 4 integration tests: default_identity, top_k_override,
fusion_strategy_override (including rejection), score_floor
- Document knobs in ADR-033 §6.1 with table, semantics, and example DSL
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Multi-model embedding support landed across the runtime + storage + memory
stack. Workspace dual-embedding now reachable end-to-end:
khive-runtime:
- RuntimeConfig.additional_embedding_models: Vec<EmbeddingModel>
- Replaces single OnceCell<embedder> with HashMap<model_name, embedder>
- default_embedder_name() + embedder(name) public methods
- KHIVE_ADDITIONAL_EMBEDDING_MODELS env-var parsing
- configured_embedding_models() helper enumerates active set
khive-db:
- V16 migration: add `embedding_model TEXT NOT NULL DEFAULT '<default>'`
column to vectors table with backfill + composite index
- VectorStore.insert / search scoped by embedding_model
khive-storage:
- VectorRecord carries model tag
- vector search params include model scope
khive-pack-memory:
- recall + remember accept optional embedding_model arg
- validation: must be a registered model name
kkernel:
- engine list now returns real loaded models (no longer empty Vec)
- engine migrate / drift-check still return not-implemented (#380/#385)
Notes:
- 16 files changed, +582/-138 lines
- Tests rebaselined for V16 (failed_migration_rolls_back tests V17 now;
store_ddl_then_event_migration_is_idempotent expects V16 head)
- Workspace: cargo build + cargo test + clippy clean + fmt clean
Lattice gap status: N/A — lattice-embed 0.2.4 already exposes both
MiniLM + paraphrase as 384-d local models with EmbeddingRoutingConfig
primitives. khive-runtime now uses these directly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses codex finding on PR #405: `khive_retrieval::hnsw::HnswIndex` doesn't resolve because `hnsw` is not a public submodule — `HnswIndex` is re-exported at the crate root (lib.rs:145). The doctest at persist/mod.rs:29 must use the public facade import. Closes the remaining gap on issue #309 (--all-features doctest failure). Verified: `RUSTC_WRAPPER= cargo test --offline -p khive-retrieval --all-features --doc` passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ges unjustified Codex review on PR #408 flagged that the three default changes from the tune sweep (temporal_half_life_days 30 to 14, decay exp to hyp, multiplier 20 to 10) were made even though REPORT.md explicitly states the synthetic eval set produced identical recall@10 = 0.9333 for ALL 116 configs — the landscape was too flat to discriminate these parameters. Reverts those three lines back to their prior values. The tuned-config.toml artifact stays as an experimental record; the grid_search.py infra stays runnable; only RecallConfig::default() is restored. A discriminating eval corpus (embed-enabled, synonym queries, partial matches) is the prerequisite before changing runtime defaults. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Verdict: REQUEST CHANGES [Major] Revert or quarantine the Rust default changes until the eval can distinguish themEvidence: Why this matters: the PR is honest in the report, but the unsupported choices are still shipped as runtime defaults via Suggested fix: leave the tuned values in [Major] The tuning CLI is not re-runnable from this branchEvidence: Why this matters: one stated OSS utility goal is that a downstream user can re-run the tuning script with their own corpus. As committed, the script depends on a Python helper package that is not present in the PR branch, so the new CLI entry point at Suggested fix: either include the contract harness package in this stack/base, or make the tune runner self-contained enough to spawn Requested Verification
Looks Right
Commands Run
What I Did Not Check
Re-Review GuidanceNarrow re-review is enough: check that runtime defaults are either reverted or backed by a discriminating eval, and that Domain utility: SKIPPED - no lore/domain MCP tools were available in this runtime. |
…ger tests Two Medium findings from codex review: 1. top_k cast bug: `(k as u32).min(100)` truncates k before capping. A request with `top_k = 4_294_967_297` (larger than u32::MAX) truncates to 1 BEFORE the cap is applied, so the result limit becomes 1, not 100. Fixed: `u32::try_from(k.min(100)).unwrap_or(100)` clamps to usize first, then narrows safely. 2. Weak tests: test_recall_default_identity only checked length and the top hit. Strengthened to compare full ordered note_id+score lists across all positions, with all three knobs explicitly set to null. test_recall_fusion_strategy_override only validated string acceptance. Added a new unit test (fusion_strategy_change_produces_observable_ordering_difference) with a deterministic fixture where RRF and Weighted strategies MUST produce different orderings — proving the fusion_strategy override actually flows into fuse_candidates, not just validation. Verified: cargo test -p khive-pack-memory --lib passes (62 unit tests). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ta-safety fixes Per Ocean direction (2026-05-25): amend ADR-043 to formalize the V16 string-tag schema design, claim V16 in ADR-015 ledger, and fix the data-safety bugs codex flagged. Defer sqlite-vec data preservation to a follow-up issue. ADR amendments: - ADR-043 §1.1 (vector store column addition): replaces the old FK-based description with the actual V16 design (TEXT embedding_model column with composite index). Includes rationale for TEXT vs BLOB FK (hot-path join cost, end-to-end shape consistency with kkernel/env-vars/registry). Documents sqlite-vec rebuild behavior and follow-up. - ADR-015 schema ledger: V16 row added with cluster-22 amendment notes. High 3 fix — atomic unknown-model validation (handlers.rs): - handle_remember now calls resolve_embedding_model(Some(name)) BEFORE create_note_with_decay_for_embedding_model. resolve_embedding_model is synchronous and doesn't load the model — it only checks registration. An unknown model is rejected before any note/FTS/vector row is written. High 2 fix — scoped delete across all model stores (operations.rs): - delete_note now iterates over registered_embedding_model_names() (new public method on KhiveRuntime) and deletes the note's vector from EVERY registered model's vector store. Previously only the default model's store was touched, leaving non-default vectors orphaned. Medium fix — KHIVE_ADDITIONAL_EMBEDDING_MODELS warning on bad names: - parse_embedding_model_list now logs tracing::warn for non-empty raw names that don't parse, instead of silently filtering them out. The function still returns a Vec (no startup failure on partial validity), but operator typos now surface at startup rather than as UnknownModel errors at request time. Deferred (follow-up issue, see ADR-043 §1.1 final paragraph): - High 1 — V16 backfill hard-codes 'all-minilm-l6-v2' for all regular vec_* tables, and sqlite-vec virtual tables are still dropped-and-rebuilt on schema mismatch (data loss for non-default deployments). A copy-with- default rebuild path is tracked separately because it requires a careful multi-step migration with vec0 INSERT INTO SELECT FROM and a verification step. Operators are warned via ADR §1.1 to back up before upgrading. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex Major 2 (PR #408): the tune CLI is not re-runnable from a fresh clone of this branch because khive_contract is in the parent package (slice 06, PR #403 — already merged to main). Codex tested slice 08 in isolation and hit ModuleNotFoundError. Adds tests/khive-contract/tune/README.md explaining: - Install khive_contract first: uv pip install -e . from tests/khive-contract/ - Run via uv run python -m tune --quick (or full) - Documents the corpus-ceiling limitation surfaced in REPORT.md - Explicit note that RecallConfig::default() was intentionally NOT changed (defaults reverted in d943b7f based on codex Major 1) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codex findings addressedVerdict: REQUEST CHANGES → fixes pushed Major 1: Unjustified default changesFixed by reverting the three
Codex correctly identified that REPORT.md acknowledges recall@10 = 0.9333 for ALL 116 configs (corpus ceiling), so these changes had no empirical justification. The tuned-config.toml artifact, grid_search.py infra, and corpus all remain — only the load-bearing defaults reverted. A discriminating eval corpus is the prerequisite before changing runtime defaults. Major 2: Tune CLI not re-runnableDocumented in commit
The Also documents the corpus-ceiling limitation explicitly. Commits: Ready for re-review. |
…l consistency Two findings from codex round 2 on PR #407: 1. Medium: runtime-level unknown-model atomicity. handle_remember validated embedding_model before calling create_note_with_decay_for_embedding_model (round 1 fix), but the runtime API itself was permissive — direct Rust callers (other packs, integration tests) would still hit the write-after- failure bug. Fix: resolve_embedding_model is now called at the start of create_note_inner BEFORE any note/FTS/vector write. The pack-handler check remains as an earlier error boundary, but atomicity is enforced at the lowest layer that performs the write. 2. High: ADR-043 internal inconsistency. The §1.1 amendment landed in round 1, but the rest of the ADR still contained normative text for the rejected FK-based design: - §8 (Backward compat) described embedding_model_id BLOB column + CHECK rebuild - Alternatives table rejected per-record model_id (which is what V16 ships) - Migration version section claimed V5 + BLOB FK Fix: - §8 rewritten to describe V14 + V16 (registry + tag column) split that actually shipped - Alternatives table row strikes-through with "Superseded by V16 (2026-05-25)" and points back to §1.1 for rationale - Migration version section split into V14 (cluster-20) and V16 (v022-polish) with the actual SQL each migration runs Codex's other round-1 findings remain addressed: - Critical (ADR drift) — now resolved end-to-end across §1.1, §8, alternatives, and §Migration version - High 2 (scoped delete) — confirmed by round-2 review at operations.rs:1395, 1408 - Medium (env var warning) — confirmed at runtime.rs:549, 560 Verified: cargo test -p khive-runtime -p khive-pack-memory --lib passes (262 + 62 tests), cargo fmt clean, deno fmt clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Slice 8/9 from PR #399 (v022-polish show). Stacked on slice 7 (dual-embedding).
Parameter tuning infrastructure for recall defaults: grid search runner, synthetic eval corpus, config nudges from grid results.
What is added
tests/khive-contract/tune/grid_search.py— grid runner over recall knob space (116 configs in 0.75s)tests/khive-contract/tune/__main__.py— CLI entrytests/khive-contract/fixtures/memories_corpus.json— synthetic eval corpustests/khive-contract/tune/results.json— sweep resultstests/khive-contract/tune/tuned-config.toml— recommended configtests/khive-contract/tune/REPORT.md— analysis writeupPARTIAL status
The play landed with grid search infra working but the synthetic eval set has a corpus ceiling — recall@10 = 0.93 for ALL configs. Cannot ground defaults with this corpus. Follow-up: harder eval corpus (embed-enabled, synonyms, partial matches).
This PR ships the infrastructure so it can be re-run when a better corpus exists.
Dependencies
Depends on #407 (slice 7: dual-embedding). Base branch is
slice/v022/07-dual-embedding.Test plan