test(contract): proper Python pytest package at tests/khive-contract by ohdearquant · Pull Request #403 · ohdearquant/khive

ohdearquant · 2026-05-25T15:23:25Z

Summary

Slice 6/9 from PR #399 (v022-polish show).

ADR-organized contract tests for the MCP surface — 63 tests across 11 files (2433 LOC). uv-managed pytest package at tests/khive-contract/.

Tests added

test_adr_001_entity_kind.py — 8 entity kinds CRUD
test_adr_002_edge_ontology.py — 15 edge relations + endpoint contracts
test_adr_014_curation.py — update/delete/merge semantics
test_adr_019_note_kind.py — 5 note kinds
test_adr_020_request_dsl.py — single + parallel + chain ops, error envelope
test_adr_023_verb_taxonomy.py — 15 product verb reachability
test_adr_027_single_tool_mcp.py — only request tool exposed
test_contract_behaviors.py — GQL property projection rules
test_manifest.py — verb coverage assertions
test_namespace_isolation.py — cross-namespace read/write boundaries
test_smoke.py — kg/gtd/memory end-to-end happy path

Package structure (uv-managed)

pyproject.toml + pytest.ini + README.md
conftest.py with shared fixtures
khive_contract/ lib (client, schema, fixtures, benchmark)

Run: uv run pytest tests/khive-contract -v

Known follow-up

The play timed out at 1h before golden snapshots + benchmark baselines could be captured. Skeleton + ADR-organized tests are real and runnable. Follow-up issue tracked separately.

Dependencies

None. Python tests in a separate directory. Mergeable independently.

Test plan

uv run pytest tests/khive-contract -v runs (when the MCP binary is built)
CI green

ADR-organized contract tests (63 collected, 11 files, 2433 LOC): - test_adr_001_entity_kind.py — 8 entity kinds CRUD - test_adr_002_edge_ontology.py — 15 edge relations + endpoint contracts - test_adr_014_curation.py — update/delete/merge semantics - test_adr_019_note_kind.py — 5 note kinds - test_adr_020_request_dsl.py — single + parallel + chain ops, error envelope - test_adr_023_verb_taxonomy.py — 15 product verb reachability - test_adr_027_single_tool_mcp.py — only `request` tool exposed - test_contract_behaviors.py — GQL property projection rules - test_manifest.py — verb coverage assertions - test_namespace_isolation.py — cross-namespace read/write boundaries - test_smoke.py — kg/gtd/memory end-to-end happy path Package structure (uv-managed): - pyproject.toml + pytest.ini + README.md - conftest.py with shared fixtures - khive_contract/ lib (client, schema, fixtures, benchmark) Run: `uv run pytest tests/khive-contract -v` PARTIAL: play timed out at 1h before golden snapshots + benchmark baselines could be captured. Skeleton + ADR-organized tests are real and runnable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ohdearquant · 2026-05-25T15:31:58Z

Self-review (lambda:khive)

Verdict: APPROVE-WITH-SUGGESTIONS

Checked

Package structure correct: pyproject.toml, pytest.ini, conftest.py, khive_contract/ lib
11 test files organized by ADR (test_adr_001 through test_adr_027 + smoke/manifest/namespace/contract_behaviors)
Sample inspection of test_adr_023_verb_taxonomy.py: real assertions on KG/GTD/Memory verbs, proper parametrization
khive_contract/ lib: client, schema, fixtures, benchmark modules
.gitkeep files for baselines/ and golden/ reserve space for golden snapshots
CI green

Acknowledged follow-up (already in PR body)

Golden snapshots + benchmark baselines were not captured before play timeout
This is a SKELETON — runs but doesn't enforce snapshot regression yet
Follow-up task tracked separately

Suggestion

Consider adding a pytest -m "not slow" exclusion default so the skeleton runs quickly in CI before the slow MCP integration tests light up.

Recommend merge once Ocean approves the slice batch.

Codex Major 2 (PR #408): the tune CLI is not re-runnable from a fresh clone of this branch because khive_contract is in the parent package (slice 06, PR #403 — already merged to main). Codex tested slice 08 in isolation and hit ModuleNotFoundError. Adds tests/khive-contract/tune/README.md explaining: - Install khive_contract first: uv pip install -e . from tests/khive-contract/ - Run via uv run python -m tune --quick (or full) - Documents the corpus-ceiling limitation surfaced in REPORT.md - Explicit note that RecallConfig::default() was intentionally NOT changed (defaults reverted in d943b7f based on codex Major 1) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(pack-memory): wire khive-retrieval as recall composer (ADR-011/021) Route pack-memory's fuse_candidates through khive_retrieval::fuse_search_results, making khive-retrieval a real consumed facade instead of an orphan crate. - Add khive-retrieval dep to khive-pack-memory/Cargo.toml - Replace direct fuse_with_strategy call with retrieval adapter (CandidateMeta side-map, HybridConfig builder, FusionStrategy conversion) - Fix issue #309: resolve --all-features compile failures in khive-retrieval (stale SqliteStore imports, missing NodeId/LinkStore imports) - Add 5 integration tests (3 fusion_surface, 2 pack-memory recall adapter) - RRF k=1 discriminator test proves strategy propagation (30x score gap) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(pack-memory): expose top_k/fusion_strategy/score_floor knobs on recall (ADR-033 §6) - Add three optional per-request fields to RecallParams: top_k (usize), fusion_strategy (string), and score_floor (f32) - fusion_strategy validated against {"rrf","weighted","union"}; clear error with valid values on invalid input - top_k overrides the result limit for a single call (capped at 100) - score_floor applied as a post-filter on the composite score after compute_score - Add parse_fusion_strategy_str helper; wire override into cfg.fuse_strategy before passing to fuse_candidates - Add 4 integration tests: default_identity, top_k_override, fusion_strategy_override (including rejection), score_floor - Document knobs in ADR-033 §6.1 with table, semantics, and example DSL Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style(adr-033): deno fmt re-pad recall knob table (post-merge cleanup) * feat(embedding): dual-model registry (MiniLM + paraphrase) per ADR-043 Multi-model embedding support landed across the runtime + storage + memory stack. Workspace dual-embedding now reachable end-to-end: khive-runtime: - RuntimeConfig.additional_embedding_models: Vec<EmbeddingModel> - Replaces single OnceCell<embedder> with HashMap<model_name, embedder> - default_embedder_name() + embedder(name) public methods - KHIVE_ADDITIONAL_EMBEDDING_MODELS env-var parsing - configured_embedding_models() helper enumerates active set khive-db: - V16 migration: add `embedding_model TEXT NOT NULL DEFAULT '<default>'` column to vectors table with backfill + composite index - VectorStore.insert / search scoped by embedding_model khive-storage: - VectorRecord carries model tag - vector search params include model scope khive-pack-memory: - recall + remember accept optional embedding_model arg - validation: must be a registered model name kkernel: - engine list now returns real loaded models (no longer empty Vec) - engine migrate / drift-check still return not-implemented (#380/#385) Notes: - 16 files changed, +582/-138 lines - Tests rebaselined for V16 (failed_migration_rolls_back tests V17 now; store_ddl_then_event_migration_is_idempotent expects V16 head) - Workspace: cargo build + cargo test + clippy clean + fmt clean Lattice gap status: N/A — lattice-embed 0.2.4 already exposes both MiniLM + paraphrase as 384-d local models with EmbeddingRoutingConfig primitives. khive-runtime now uses these directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * tune(recall): grid search infra + PARTIAL default changes * fix(retrieval): correct doctest import — use re-export at crate root Addresses codex finding on PR #405: `khive_retrieval::hnsw::HnswIndex` doesn't resolve because `hnsw` is not a public submodule — `HnswIndex` is re-exported at the crate root (lib.rs:145). The doctest at persist/mod.rs:29 must use the public facade import. Closes the remaining gap on issue #309 (--all-features doctest failure). Verified: `RUSTC_WRAPPER= cargo test --offline -p khive-retrieval --all-features --doc` passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * revert(memory): keep RecallConfig defaults — corpus ceiling made changes unjustified Codex review on PR #408 flagged that the three default changes from the tune sweep (temporal_half_life_days 30 to 14, decay exp to hyp, multiplier 20 to 10) were made even though REPORT.md explicitly states the synthetic eval set produced identical recall@10 = 0.9333 for ALL 116 configs — the landscape was too flat to discriminate these parameters. Reverts those three lines back to their prior values. The tuned-config.toml artifact stays as an experimental record; the grid_search.py infra stays runnable; only RecallConfig::default() is restored. A discriminating eval corpus (embed-enabled, synonym queries, partial matches) is the prerequisite before changing runtime defaults. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(pack-memory): address PR #406 codex findings — top_k cast + stronger tests Two Medium findings from codex review: 1. top_k cast bug: `(k as u32).min(100)` truncates k before capping. A request with `top_k = 4_294_967_297` (larger than u32::MAX) truncates to 1 BEFORE the cap is applied, so the result limit becomes 1, not 100. Fixed: `u32::try_from(k.min(100)).unwrap_or(100)` clamps to usize first, then narrows safely. 2. Weak tests: test_recall_default_identity only checked length and the top hit. Strengthened to compare full ordered note_id+score lists across all positions, with all three knobs explicitly set to null. test_recall_fusion_strategy_override only validated string acceptance. Added a new unit test (fusion_strategy_change_produces_observable_ordering_difference) with a deterministic fixture where RRF and Weighted strategies MUST produce different orderings — proving the fusion_strategy override actually flows into fuse_candidates, not just validation. Verified: cargo test -p khive-pack-memory --lib passes (62 unit tests). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(embedding): address PR #407 codex findings — ADR amendment + 3 data-safety fixes Per Ocean direction (2026-05-25): amend ADR-043 to formalize the V16 string-tag schema design, claim V16 in ADR-015 ledger, and fix the data-safety bugs codex flagged. Defer sqlite-vec data preservation to a follow-up issue. ADR amendments: - ADR-043 §1.1 (vector store column addition): replaces the old FK-based description with the actual V16 design (TEXT embedding_model column with composite index). Includes rationale for TEXT vs BLOB FK (hot-path join cost, end-to-end shape consistency with kkernel/env-vars/registry). Documents sqlite-vec rebuild behavior and follow-up. - ADR-015 schema ledger: V16 row added with cluster-22 amendment notes. High 3 fix — atomic unknown-model validation (handlers.rs): - handle_remember now calls resolve_embedding_model(Some(name)) BEFORE create_note_with_decay_for_embedding_model. resolve_embedding_model is synchronous and doesn't load the model — it only checks registration. An unknown model is rejected before any note/FTS/vector row is written. High 2 fix — scoped delete across all model stores (operations.rs): - delete_note now iterates over registered_embedding_model_names() (new public method on KhiveRuntime) and deletes the note's vector from EVERY registered model's vector store. Previously only the default model's store was touched, leaving non-default vectors orphaned. Medium fix — KHIVE_ADDITIONAL_EMBEDDING_MODELS warning on bad names: - parse_embedding_model_list now logs tracing::warn for non-empty raw names that don't parse, instead of silently filtering them out. The function still returns a Vec (no startup failure on partial validity), but operator typos now surface at startup rather than as UnknownModel errors at request time. Deferred (follow-up issue, see ADR-043 §1.1 final paragraph): - High 1 — V16 backfill hard-codes 'all-minilm-l6-v2' for all regular vec_* tables, and sqlite-vec virtual tables are still dropped-and-rebuilt on schema mismatch (data loss for non-default deployments). A copy-with- default rebuild path is tracked separately because it requires a careful multi-step migration with vec0 INSERT INTO SELECT FROM and a verification step. Operators are warned via ADR §1.1 to back up before upgrading. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(tune): document khive_contract dependency + run instructions Codex Major 2 (PR #408): the tune CLI is not re-runnable from a fresh clone of this branch because khive_contract is in the parent package (slice 06, PR #403 — already merged to main). Codex tested slice 08 in isolation and hit ModuleNotFoundError. Adds tests/khive-contract/tune/README.md explaining: - Install khive_contract first: uv pip install -e . from tests/khive-contract/ - Run via uv run python -m tune --quick (or full) - Documents the corpus-ceiling limitation surfaced in REPORT.md - Explicit note that RecallConfig::default() was intentionally NOT changed (defaults reverted in d943b7f based on codex Major 1) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * style: cargo fmt --all * style: deno fmt ADR tables * fix(test): use single-token query matching all fixture memories * fix(test): align partial_config test with reverted default decay_model * fix(embedding): codex round 2 — runtime-layer atomicity + ADR internal consistency Two findings from codex round 2 on PR #407: 1. Medium: runtime-level unknown-model atomicity. handle_remember validated embedding_model before calling create_note_with_decay_for_embedding_model (round 1 fix), but the runtime API itself was permissive — direct Rust callers (other packs, integration tests) would still hit the write-after- failure bug. Fix: resolve_embedding_model is now called at the start of create_note_inner BEFORE any note/FTS/vector write. The pack-handler check remains as an earlier error boundary, but atomicity is enforced at the lowest layer that performs the write. 2. High: ADR-043 internal inconsistency. The §1.1 amendment landed in round 1, but the rest of the ADR still contained normative text for the rejected FK-based design: - §8 (Backward compat) described embedding_model_id BLOB column + CHECK rebuild - Alternatives table rejected per-record model_id (which is what V16 ships) - Migration version section claimed V5 + BLOB FK Fix: - §8 rewritten to describe V14 + V16 (registry + tag column) split that actually shipped - Alternatives table row strikes-through with "Superseded by V16 (2026-05-25)" and points back to §1.1 for rationale - Migration version section split into V14 (cluster-20) and V16 (v022-polish) with the actual SQL each migration runs Codex's other round-1 findings remain addressed: - Critical (ADR drift) — now resolved end-to-end across §1.1, §8, alternatives, and §Migration version - High 2 (scoped delete) — confirmed by round-2 review at operations.rs:1395, 1408 - Medium (env var warning) — confirmed at runtime.rs:549, 560 Verified: cargo test -p khive-runtime -p khive-pack-memory --lib passes (262 + 62 tests), cargo fmt clean, deno fmt clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci: force re-trigger * ci: force re-trigger via doc whitespace touch * style: deno fmt --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

This was referenced May 25, 2026

tune(recall): grid search infra + PARTIAL default changes #408

Merged

Show: v022-polish (overnight) — wire retrieval/recall/dual-embed + tests + plugin polish — SLICE BEFORE MERGE #399

Closed

ohdearquant merged commit 1e26334 into main May 25, 2026
3 checks passed

ohdearquant deleted the slice/v022/06-python-tests branch May 25, 2026 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(contract): proper Python pytest package at tests/khive-contract#403

test(contract): proper Python pytest package at tests/khive-contract#403
ohdearquant merged 1 commit into
mainfrom
slice/v022/06-python-tests

ohdearquant commented May 25, 2026

Uh oh!

ohdearquant commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ohdearquant commented May 25, 2026

Summary

Tests added

Package structure (uv-managed)

Known follow-up

Dependencies

Test plan

Uh oh!

ohdearquant commented May 25, 2026

Self-review (lambda:khive)

Checked

Acknowledged follow-up (already in PR body)

Suggestion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant