Skip to content

Show: v022-polish (overnight) — wire retrieval/recall/dual-embed + tests + plugin polish — SLICE BEFORE MERGE#399

Draft
ohdearquant wants to merge 19 commits into
mainfrom
show/v022-polish/integration
Draft

Show: v022-polish (overnight) — wire retrieval/recall/dual-embed + tests + plugin polish — SLICE BEFORE MERGE#399
ohdearquant wants to merge 19 commits into
mainfrom
show/v022-polish/integration

Conversation

@ohdearquant
Copy link
Copy Markdown
Owner

Show: v022-polish (overnight autonomous run, 2026-05-25 02:11 → 06:32 EDT)

Multi-play DAG executed against main. 9 plays merged into show/v022-polish/integration. Per Ocean's directive 2026-05-25 02:19: this PR is intended to be sliced into 5-20 smaller PRs and codex-reviewed before merging — do not merge as-is.

Plays merged

Play Playbook Status Key delivery
recon arch-discovery ✅ APPROVE 7-axis state-of-codebase report (_recon/state.md, 69 lines, file:line evidence)
cli-tests test-coverage ✅ APPROVE-with-fixes (2 MIN) 79 new tests, 476 total, golden files for help text
plugin-polish product-polish ✅ APPROVE 234/234 examples validated, 3 new SKILL.md (propose/review/withdraw), KG/GTD/Memory plugins bumped to 0.2.2
wire-retrieval feature ✅ APPROVE (0.95) khive-pack-memory consumes khive-retrieval::fuse_search_results; issue #309 fixed; +5 tests
wire-recall-pipeline feature ✅ APPROVE (0.92) top_k/fusion_strategy/score_floor knobs on recall; ADR-033 §6 documented; +5 tests
python-tests test-coverage ⚠️ PARTIAL (timeout) tests/khive-contract/ pytest package, 63 tests across 11 files, golden/benchmark TODO
dual-embedding feature ⚠️ PARTIAL→manual-fix (timeout) Multi-model registry; V16 migration; recall scoped by model; 3 test rebaselines applied by orchestrator post-timeout
close-issues resolve-issues ✅ APPROVE 0 closures (correct verdict — wiring issues are implementation work, not closure-ready); audit log committed
param-tuning test-coverage ⚠️ PARTIAL Grid search infra works (116 configs in 0.75s); synthetic eval set has ceiling (recall@10 = 0.93 for all configs); 3 config nudges applied

Aggregate stats

  • 18 commits ahead of main (9 feature/chore + 9 integration merges)
  • +3300 LOC roughly (recon report + cli tests + python-tests skeleton + wire code + dual-embedding + tune infra)
  • Workspace tests: 66 test crates pass, 0 fail (verified post-integration)
  • Worktrees pruned after each merge — only adr-001-015-alignment-integration (on this PR's HEAD) + adr-001-015-alignment-impl-c16 (deferred c16) remain

Suggested slicing (for /codex-pr-review workflow)

  1. PR 1: docs(recon) — recon report (1 file, 69 lines) — context for reviewers
  2. PR 2: test(cli) — 11 cli-tests files (+963 lines)
  3. PR 3: chore(marketplace) — plugin-polish (20 files)
  4. PR 4: feat(retrieval-composer) — wire-retrieval (8 files, ADR-011)
  5. PR 5: feat(recall-knobs) — wire-recall-pipeline (3 files, ADR-033)
  6. PR 6: test(contract) — python-tests package (13 files)
  7. PR 7: feat(embedding-registry) — dual-embedding (17 files, ADR-043 + V16 migration)
  8. PR 8: tune(recall) — param-tuning grid + config nudges (6 files)
  9. PR 9: chore(audit) — close-issues log (1 file)

Each slice maps to a single play's commits and can be codex-reviewed independently. Recommended sequential merge order = listed above (matches dependency chain).

Known gaps / follow-ups (not blockers)

  • python-tests skeleton needs golden snapshots + benchmark baselines (timed out before that phase)
  • param-tuning needs a harder eval corpus (embed-enabled, synonym queries) to actually ground defaults — current eval set has corpus ceiling
  • dual-embedding post-timeout test fixes applied by orchestrator (V16 migration version updates); no critic gate ran on those specific fixes, but cargo test --workspace is green
  • 15 wiring-related GitHub issues remain open — recon's "wiring" category is more accurately "implementation follow-ups with cross-crate deps"; not closeable from this show
  • npm publish for v0.2.2 still blocked (NPM_TOKEN scope issue; out of show scope)

How to verify locally

cd /Users/lion/khive-work/worktrees/adr-001-015-alignment-integration
git pull origin show/v022-polish/integration
cd crates && cargo build --workspace && cargo test --workspace
cd ../cli && deno test --allow-all tests/
cd ../tests/khive-contract && uv run pytest -v

🤖 Generated overnight by orchestrate:show / dynamic /loop pacing

ohdearquant and others added 19 commits May 25, 2026 02:25
- Add cli/tests/helpers.ts with subprocess runner, golden file comparator,
  JSON shape validator, and makeTempRepo() fixture with valid KG structure
- Add cli/tests/behavior/exit_code_test.ts (31 cases): exit 0/1 for all
  top-level flags, unknown commands, kg/pack/auth subcommands, in-repo ops
- Add cli/tests/behavior/error_test.ts (13 cases): error messages, --help
  hints, not-implemented stubs, invalid NDJSON, out-of-repo commands
- Add cli/tests/behavior/parse_test.ts (21 cases): flag parsing for stats
  (--json), validate (--format json, --quiet, --no-rules), doctor (--json),
  log (-n, --json), diff (--json, --name-only), pack stubs
- Add cli/tests/contract/help_test.ts (17 cases): golden file comparisons
  for --help at top-level, kg, pack, auth groups; content assertions
- Add cli/tests/contract/output_test.ts (8 cases): version semver check,
  kg stats --json shape, kg validate --format json shape, kg doctor --json
- Add golden files: help_toplevel.txt, help_kg.txt, help_pack.txt, help_auth.txt
- Add deno.json tasks: test:behavior and test:contract

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7-axis survey (orphan crates, ADR alignment, marketplace, open issues,
embedding surface, test inventory, CLI) with file:line evidence per claim
and prioritized backlog for downstream plays.

Key findings:
- khive-bm25/hnsw/fusion already consumed by khive-retrieval (mfst+src)
- khive-retrieval itself is the unconsumed facade; downstream must wire it
- lattice-embed 0.2.4 has both MiniLM + paraphrase as 384-d local models
  + dual-write/routing/migration primitives — dual embedding is a runtime
  exposure problem, not a lattice gap
- khive-runtime has ONE OnceCell embedder; need a model registry
- Memory recall subhandlers exist (recall.embed/candidates/fuse/rerank/score);
  composability is there but not all wired
- ADR-043 schema ownership drift: spec says runtime, impl is in db
…verb surface

Audited all three marketplace plugins against the actual pack handler
registrations and fixed every stale example, count, and arg reference.

KG plugin (14 files touched):
- Fixed 10 P0 broken examples: positional query() → keyword, missing
  kind= on update/delete, placeholder batches, unsupported filter/status/tags
- Added 3 new SKILL.md files for ADR-046 verbs: propose, review, withdraw
- Updated all stale counts: 6→8 entity kinds, 13→15 edge relations, 11→14 verbs

GTD plugin: bumped version, added start?/end? to assign docs, listed
process and plan skills in README.

Memory plugin: bumped version, documented parameter aliases
(importance/salience, decay_factor/decay, source_id/source).

All three plugin.json versions bumped to 0.2.2.

New tooling:
- marketplace/_validators/check_examples.py: stdlib-only validator
  (234 examples checked, 0 invalid)
- marketplace/INSTALL.md: installation and verification guide

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Route pack-memory's fuse_candidates through khive_retrieval::fuse_search_results,
making khive-retrieval a real consumed facade instead of an orphan crate.

- Add khive-retrieval dep to khive-pack-memory/Cargo.toml
- Replace direct fuse_with_strategy call with retrieval adapter
  (CandidateMeta side-map, HybridConfig builder, FusionStrategy conversion)
- Fix issue #309: resolve --all-features compile failures in khive-retrieval
  (stale SqliteStore imports, missing NodeId/LinkStore imports)
- Add 5 integration tests (3 fusion_surface, 2 pack-memory recall adapter)
- RRF k=1 discriminator test proves strategy propagation (30x score gap)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…recall (ADR-033 §6)

- Add three optional per-request fields to RecallParams: top_k (usize),
  fusion_strategy (string), and score_floor (f32)
- fusion_strategy validated against {"rrf","weighted","union"}; clear error
  with valid values on invalid input
- top_k overrides the result limit for a single call (capped at 100)
- score_floor applied as a post-filter on the composite score after compute_score
- Add parse_fusion_strategy_str helper; wire override into cfg.fuse_strategy
  before passing to fuse_candidates
- Add 4 integration tests: default_identity, top_k_override,
  fusion_strategy_override (including rejection), score_floor
- Document knobs in ADR-033 §6.1 with table, semantics, and example DSL

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ADR-organized contract tests (63 collected, 11 files, 2433 LOC):
- test_adr_001_entity_kind.py — 8 entity kinds CRUD
- test_adr_002_edge_ontology.py — 15 edge relations + endpoint contracts
- test_adr_014_curation.py — update/delete/merge semantics
- test_adr_019_note_kind.py — 5 note kinds
- test_adr_020_request_dsl.py — single + parallel + chain ops, error envelope
- test_adr_023_verb_taxonomy.py — 15 product verb reachability
- test_adr_027_single_tool_mcp.py — only `request` tool exposed
- test_contract_behaviors.py — GQL property projection rules
- test_manifest.py — verb coverage assertions
- test_namespace_isolation.py — cross-namespace read/write boundaries
- test_smoke.py — kg/gtd/memory end-to-end happy path

Package structure (uv-managed):
- pyproject.toml + pytest.ini + README.md
- conftest.py with shared fixtures
- khive_contract/ lib (client, schema, fixtures, benchmark)

Run: `uv run pytest tests/khive-contract -v`

PARTIAL: play timed out at 1h before golden snapshots + benchmark
baselines could be captured. Skeleton + ADR-organized tests are real
and runnable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Multi-model embedding support landed across the runtime + storage + memory
stack. Workspace dual-embedding now reachable end-to-end:

khive-runtime:
  - RuntimeConfig.additional_embedding_models: Vec<EmbeddingModel>
  - Replaces single OnceCell<embedder> with HashMap<model_name, embedder>
  - default_embedder_name() + embedder(name) public methods
  - KHIVE_ADDITIONAL_EMBEDDING_MODELS env-var parsing
  - configured_embedding_models() helper enumerates active set

khive-db:
  - V16 migration: add `embedding_model TEXT NOT NULL DEFAULT '<default>'`
    column to vectors table with backfill + composite index
  - VectorStore.insert / search scoped by embedding_model

khive-storage:
  - VectorRecord carries model tag
  - vector search params include model scope

khive-pack-memory:
  - recall + remember accept optional embedding_model arg
  - validation: must be a registered model name

kkernel:
  - engine list now returns real loaded models (no longer empty Vec)
  - engine migrate / drift-check still return not-implemented (#380/#385)

Notes:
- 16 files changed, +582/-138 lines
- Tests rebaselined for V16 (failed_migration_rolls_back tests V17 now;
  store_ddl_then_event_migration_is_idempotent expects V16 head)
- Workspace: cargo build + cargo test + clippy clean + fmt clean

Lattice gap status: N/A — lattice-embed 0.2.4 already exposes both
MiniLM + paraphrase as 384-d local models with EmbeddingRoutingConfig
primitives. khive-runtime now uses these directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds closure log for the close-issues play. Result: 0 closed, 15 skipped.
All 15 wiring-category candidates verified against actual code paths — none
had sufficient commit SHA + file:line proof for a safe permanent close.

Three issues (#397, #385, #380) are closest to resolved but still have
explicitly-deferred implementation sections in source comments.
@ohdearquant
Copy link
Copy Markdown
Owner Author

Overnight run complete — CI green ✅

  • 9 plays merged into integration
  • style(adr-033): deno fmt re-pad recall knob table cleanup committed after first CI run flagged the format drift
  • CI on PR HEAD 32f853c:
    • CI (macos-latest): ✅ pass (3m11s)
    • CI (ubuntu-latest): ✅ pass (3m15s)
    • Docs lint: ✅ pass (6s)

Ready for slicing whenever you're up. PR remains draft per directive — slice into 5-20 smaller PRs, codex-review each, sequentially merge.

Full overnight summary at $HOME/khive-work/shows/v022-polish/_overnight_summary.md.

@ohdearquant
Copy link
Copy Markdown
Owner Author

Sliced — superseded by 9 smaller PRs

Per the slicing directive in this PR body, the integration has been sliced into 9 reviewable PRs:

Independent (mergeable in parallel, targets main)

Stacked chain (sequential merge order)

When stacked PRs merge, GitHub will auto-rebase children onto main.

Review plan

Firing codex on each slice now. Will iterate to APPROVE per /codex-pr-review skill. This PR stays open as the historical reference and will close after the slice cycle completes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant