diff --git a/CHANGELOG.md b/CHANGELOG.md index fc778df..2a94646 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,11 @@ All notable changes to vouch are documented here. Format follows ## [Unreleased] +### Fixed +- `vouch migrate` change detection now compares transformed output against the + pre-transform parsed dict, so YAML whitespace or key-order round-trip + differences no longer appear as false positives in the changed-file list. + ### Added - `vouch install-mcp ` — one-command adapter writer that drops the right MCP config templates into a project tree, idempotently. Eight hosts diff --git a/proposals/VEP-0004-migration.md b/proposals/VEP-0004-migration.md new file mode 100644 index 0000000..19ea089 --- /dev/null +++ b/proposals/VEP-0004-migration.md @@ -0,0 +1,252 @@ +--- +vep: "0004" +title: "Versioned on-disk format migration with atomic rollback" +author: greatjourney589 +status: draft +created: 2026-06-04 +landed-in: "" +supersedes: [] +superseded-by: "" +--- + +# VEP-0004: Versioned on-disk format migration with atomic rollback + +## Summary + +Add `schema_version` tracking to every vouch KB and a `vouch migrate` +command that upgrades the on-disk `.vouch/` layout atomically between +minor versions. Migration is all-or-nothing: artifacts are transformed +into a temp directory, validated against the target Pydantic models, then +swapped in via an atomic rename. Any failure leaves the original KB +completely untouched. This is the last hard blocker before 1.0 can freeze +the on-disk format. + +## Motivation + +Today there is no migration path between vouch schema versions. Any field +added to a Pydantic model without a default causes `KBStore.load_claim()` +(and its siblings) to throw a `ValidationError` at read time, making the +entire KB unreadable. There is no `vouch migrate`, no version header in any +artifact, and no recovery path short of manually patching every YAML file. + +Concrete example: a team with 500 approved claims upgrades from vouch 0.1 +to 0.2. If 0.2 adds a single non-defaulted field to `Claim`, every claim +file becomes unreadable. `vouch doctor` crashes before it can surface a +useful error. `vouch export` fails for the same reason. The bundle import +path (`bundle.py`) validates artifact files against the current Pydantic +models before accepting them, so even the export-then-reimport escape hatch +is blocked. + +Three workarounds were evaluated and found lacking: + +- **`vouch export` + manual YAML rewrite + `vouch import-apply`:** the bundle + import path calls the same storage layer and fails identically. +- **`vouch doctor`:** has no notion of schema versions; crashes at read time + before any health check runs. +- **Pinning `pyproject.toml` version constraints:** prevents upgrading the + tool, not a solution. + +This is the last missing piece before 1.0 can declare the on-disk format +stable and guarantee upgrade safety. + +## Proposal + +### 1. Schema version constant + +`src/vouch/models.py` gains a module-level constant: + +```python +VOUCH_SCHEMA_VERSION = "0.1" +``` + +This is the single source of truth for the schema version the installed +vouch expects. It is bumped (by hand, in the same commit that makes a +breaking on-disk change) whenever the format changes. + +### 2. `schema_version` in `config.yaml` + +`KBStore.init()` writes `schema_version: "0.1"` into `config.yaml`. +`KBStore.__init__()` reads it back and raises `SchemaMismatchError` when +the stored version does not match `VOUCH_SCHEMA_VERSION`: + +```python +class SchemaMismatchError(RuntimeError): + """Stored KB schema_version does not match the installed vouch.""" +``` + +KBs created before this change have no `schema_version` key; they are +treated as `"0.1"` (the current version) and so pass the guard without +requiring migration. Future breaking changes bump `VOUCH_SCHEMA_VERSION` +to `"0.2"`, at which point old KBs fail the guard with a clear error +message pointing to `vouch migrate`. + +### 3. `vouch migrate` CLI command + +```bash +vouch migrate [--dry-run] [--backup] [--from VERSION] [--to VERSION] +``` + +- `--dry-run`: walks every artifact, applies all transforms, reports what + would change — no writes. +- `--backup`: copies `.vouch/` to `.vouch-backup-/` before + mutating anything. +- `--from` / `--to`: explicit version override (defaults: read from + `config.yaml` → `VOUCH_SCHEMA_VERSION`). +- Exits non-zero if any artifact fails to migrate; partial runs are rolled + back (all-or-nothing write via temp directory + atomic rename, the same + pattern used by `rebuild_index`). + +### 4. Migration registry in `src/vouch/migration.py` + +A new module containing a list of `Migration` objects, each covering one +version step: + +```python +@dataclass +class Migration: + from_version: str + to_version: str + transforms: list[Transform] + +MIGRATIONS: list[Migration] = [] # populated as schema versions are added +``` + +Each `Transform` is a callable `(raw_dict: dict, artifact_type: str) -> dict` +operating on the YAML-parsed dict before Pydantic deserialisation. This +decouples migration logic from the Pydantic models at the target version — +a 0.1→0.2 transform can run against raw dicts even after the 0.1 model +class is removed. + +`migrate_kb(root, from_v, to_v, *, dry_run)` chains the relevant +`Migration` steps, writes to `.vouch-migrate-tmp/` (a sibling of `.vouch/`), +validates every file, then atomically replaces `.vouch/` or rolls back on any +failure. + +### 5. Rollback guarantee + +`migrate_kb()` follows the atomic-swap pattern already proven in +`rebuild_index`: + +1. Write all migrated artifacts to `.vouch-migrate-tmp/` (sibling of `.vouch/`). +2. Validate every migrated artifact loads cleanly under the target Pydantic + models. +3. Rename `.vouch/` → `.vouch-pre-migrate/`, `.vouch-migrate-tmp/` → + `.vouch/`. +4. On any failure in step 2 or 3: leave the original `.vouch/` untouched, + write `migration.rollback` to the audit log, exit non-zero. + +### 6. Bundle schema version + +`build_manifest()` includes `schema_version` in `manifest.json`. +`import_check()` rejects bundles whose `schema_version` is newer than the +installed version with a clear message. Bundles with no `schema_version` +field (pre-VEP-0004) are accepted as-is — additive compatibility. + +### 7. Health and status surfaces + +- `health.status()` includes `schema_version` from `config.yaml`. +- `health.doctor()` adds an `error`-severity finding when the stored + `schema_version` does not match `VOUCH_SCHEMA_VERSION`. + +## Design + +### migrate_kb algorithm + +```python +def migrate_kb(root: Path, from_v: str, to_v: str, *, dry_run: bool = False) -> MigrateResult: + steps = _chain(from_v, to_v) # ordered list of Migration objects + if not steps: + return MigrateResult(changed=[], skipped=[], from_v=from_v, to_v=to_v) + + kb_dir = root / KB_DIRNAME + tmp_dir = kb_dir.parent / ".vouch-migrate-tmp" + tmp_dir.mkdir(exist_ok=True) + + changed = [] + try: + for sub in MIGRATABLE_SUBDIRS: + ... # read each yaml, apply transforms, write to tmp_dir + _validate_all(tmp_dir) # load every file under target models + if not dry_run: + _atomic_swap(kb_dir, tmp_dir) # rename old → pre-migrate, tmp → live + except Exception: + shutil.rmtree(tmp_dir, ignore_errors=True) + raise + return MigrateResult(...) +``` + +### Version chain resolution + +`_chain(from_v, to_v)` walks `MIGRATIONS` to find every contiguous step +between `from_v` and `to_v`. A multi-hop upgrade (0.1 → 0.2 → 0.3) applies +each Migration in sequence so every intermediate transform runs correctly. +If no path exists, `migrate_kb` raises `ValueError` before touching any +files. + +### SchemaMismatchError guard placement + +`KBStore.__init__` reads `config.yaml` and calls `assert_schema_ok()`. +The `server.py` and `jsonl_server.py` startup paths already construct a +`KBStore`; they gain a `try/except SchemaMismatchError` that returns a +structured error to the caller rather than a traceback. + +The destructive CLI commands (`approve`, `reject`, `index`, `crystallize`) +use `_load_store()`, which already calls `KBStore(discover_root(...))`, so +the guard fires automatically. + +## Compatibility + +- Writing `schema_version` to `config.yaml` is **additive** for consumers + that ignore unknown fields. KBs created before this VEP have no + `schema_version` key; they are treated as `"0.1"` and pass the guard + without requiring migration. +- The `manifest.json` bundle format gains a `schema_version` field — + additive. `import_check` applies a compatibility window for bundles + without the field. +- `health.status()` gains a `schema_version` key — additive. +- The `SchemaMismatchError` guard in `KBStore.__init__` is a **breaking + change** for callers that were silently relying on cross-version reads + succeeding despite `ValidationError`s. This is the correct trade-off: + the current behaviour (silent data corruption on round-trip) is worse + than a loud, actionable error. + +## Security implications + +None. The migration path reads existing `.vouch/` files (which are already +readable by the vouch process) and writes into a sibling temp directory +under `.vouch/`. No new trust boundaries are crossed. The `--backup` flag +copies files but does not change their permissions. + +## Performance implications + +`migrate_kb` is a one-shot maintenance operation, not a hot path. It reads +and rewrites every YAML artifact once. For a KB with 1 000 claims this is +expected to complete in under a second on a modern SSD. The atomic rename +in step 3 is O(1) on any POSIX filesystem. + +## Open questions + +- Should `vouch migrate` also rebuild `state.db` at the end? The index is + a derived cache, so a `vouch index` after migration is always safe; but + wrapping it automatically would be friendlier. + +## Alternatives considered + +- **Never break the on-disk format:** requires never adding fields without + defaults and never removing fields. Untenable for a pre-1.0 project where + the schema must evolve to fix design mistakes. +- **Always-forward-compatible Pydantic models (`extra="ignore"`, all fields + `Optional`):** silently drops data on round-trip, which is worse than a + migration error for a system whose value proposition is durable, audited + knowledge. +- **Require users to re-init and re-approve everything on upgrade:** + destroys the KB's audit trail and decided history. + +## References + +- ROADMAP.md: "Migration story: `vouch migrate` to upgrade on-disk layout + between minor versions without losing the audit trail." +- `health.py:rebuild_index` — the atomic temp-file-swap pattern this VEP + reuses. +- `embeddings/migration.py` — the embedding-model backfill pattern, a + narrower precedent for the same idea. diff --git a/src/vouch/bundle.py b/src/vouch/bundle.py index c579c67..b0a7bc2 100644 --- a/src/vouch/bundle.py +++ b/src/vouch/bundle.py @@ -29,7 +29,16 @@ import yaml from . import audit -from .models import Claim, Entity, Evidence, Proposal, Relation, Session, Source +from .models import ( + VOUCH_SCHEMA_VERSION, + Claim, + Entity, + Evidence, + Proposal, + Relation, + Session, + Source, +) from .storage import _deserialize_page, sha256_hex MANIFEST_NAME = "manifest.json" @@ -96,6 +105,7 @@ def build_manifest(kb_dir: Path) -> dict[str, Any]: h.update(f["sha256"].encode()) return { "spec": SPEC_VERSION, + "schema_version": VOUCH_SCHEMA_VERSION, "bundle_id": h.hexdigest(), "files": files, "counts": { @@ -464,6 +474,24 @@ def import_check(kb_dir: Path, bundle_path: Path) -> ImportCheckResult: return ImportCheckResult(False, "", [], [], [], ["bundle missing manifest.json"]) manifest = json.loads(tar.extractfile(mf_member).read().decode()) # type: ignore[union-attr] bundle_id = manifest.get("bundle_id", "") + bundle_schema = manifest.get("schema_version") + if bundle_schema is not None: + try: + bundle_ver = tuple(int(x) for x in str(bundle_schema).split(".")) + installed_ver = tuple(int(x) for x in VOUCH_SCHEMA_VERSION.split(".")) + except (ValueError, AttributeError): + return ImportCheckResult( + False, bundle_id, [], [], [], + [f"bundle schema_version {bundle_schema!r} is not a valid version string"], + ) + if bundle_ver > installed_ver: + return ImportCheckResult( + False, bundle_id, [], [], [], + [ + f"bundle schema_version {bundle_schema!r} is newer than installed " + f"vouch {VOUCH_SCHEMA_VERSION!r} — upgrade vouch before importing" + ], + ) recorded = {f["path"]: f for f in manifest["files"]} manifest_paths = set(recorded) for f in manifest["files"]: diff --git a/src/vouch/cli.py b/src/vouch/cli.py index e446d4c..1451622 100644 --- a/src/vouch/cli.py +++ b/src/vouch/cli.py @@ -60,6 +60,7 @@ ArtifactNotFoundError, KBNotFoundError, KBStore, + SchemaMismatchError, discover_root, ) @@ -82,6 +83,8 @@ def _cli_errors() -> Iterator[None]: migrations_mod.MigrationError, ) as e: raise click.ClickException(str(e)) from e + except SchemaMismatchError as e: + raise click.ClickException(str(e)) from e def _load_store(start: Path | None = None) -> KBStore: @@ -91,6 +94,10 @@ def _load_store(start: Path | None = None) -> KBStore: click.echo(f"error: {e}", err=True) click.echo("hint: run `vouch init` in your project root.", err=True) sys.exit(2) + except SchemaMismatchError as e: + click.echo(f"error: {e}", err=True) + click.echo("hint: run `vouch migrate` to upgrade the on-disk layout.", err=True) + sys.exit(3) def _whoami() -> str: @@ -1117,6 +1124,68 @@ def reindex(embeddings: bool, backfill: bool, force: bool, model: str | None) -> click.echo("reindex: FTS5 rebuilt") +@cli.command() +@click.option("--dry-run", is_flag=True, help="Report what would change without writing.") +@click.option("--backup", is_flag=True, help="Copy .vouch/ to .vouch-backup-/ first.") +@click.option("--from", "from_version", default=None, help="Override source schema version.") +@click.option("--to", "to_version", default=None, help="Override target schema version.") +@click.option("--path", default=".", type=click.Path(file_okay=False), show_default=True) +def migrate(dry_run: bool, backup: bool, from_version: str | None, + to_version: str | None, path: str) -> None: + """Upgrade the on-disk .vouch/ layout to the current schema version.""" + import shutil + + from .migration import migrate_kb + from .models import VOUCH_SCHEMA_VERSION + from .storage import CONFIG_FILENAME, KB_DIRNAME, _yaml_load + + try: + root = discover_root(Path(path)) + except KBNotFoundError as e: + click.echo(f"error: {e}", err=True) + click.echo("hint: run `vouch init` in your project root.", err=True) + sys.exit(2) + kb_dir = root / KB_DIRNAME + cfg_path = kb_dir / CONFIG_FILENAME + + stored_version: str + if cfg_path.exists(): + cfg = _yaml_load(cfg_path.read_text()) or {} + stored_version = str(cfg.get("schema_version") or "0.1") + else: + stored_version = "0.1" + + from_v = from_version or stored_version + to_v = to_version or VOUCH_SCHEMA_VERSION + + if from_v == to_v: + click.echo(f"migrate: already at schema_version {to_v!r}, nothing to do.") + return + + if dry_run: + click.echo(f"migrate: dry-run {from_v!r} → {to_v!r}") + + if backup and not dry_run: + import time + ts = int(time.time()) + backup_dir = root / f".vouch-backup-{ts}" + shutil.copytree(str(kb_dir), str(backup_dir)) + click.echo(f"migrate: backup written to {backup_dir}") + + with _cli_errors(): + result = migrate_kb(root, from_v, to_v, dry_run=dry_run) + + if result.changed: + click.echo(f"migrate: {len(result.changed)} file(s) changed") + for f in result.changed: + click.echo(f" {f}") + else: + click.echo("migrate: no files changed (transforms were no-ops)") + + if not dry_run: + click.echo(f"migrate: schema_version {from_v!r} → {to_v!r} complete") + + @cli.command() @click.option("--tail", default=20, show_default=True, type=int) @click.option("--json", "as_json", is_flag=True) diff --git a/src/vouch/health.py b/src/vouch/health.py index fb6ee53..7e65690 100644 --- a/src/vouch/health.py +++ b/src/vouch/health.py @@ -17,6 +17,7 @@ from . import index_db from .audit import count_events +from .models import VOUCH_SCHEMA_VERSION, ClaimStatus, ProposalStatus from .models import Claim, ClaimStatus, Entity, Page, ProposalKind, ProposalStatus from .storage import KBStore, _yaml_load, sha256_hex from .verify import verify_all @@ -42,10 +43,19 @@ class HealthReport: counts: dict[str, Any] = field(default_factory=dict) +def _stored_schema_version(store: KBStore) -> str | None: + if not store.config_path.exists(): + return None + cfg = _yaml_load(store.config_path.read_text()) or {} + return cfg.get("schema_version") + + +def status(store: KBStore) -> dict: def status(store: KBStore) -> dict[str, Any]: """Quick, machine-readable summary. No deep checks.""" return { "kb_dir": str(store.kb_dir), + "schema_version": _stored_schema_version(store), "claims": len(store.list_claims()), "pages": len(store.list_pages()), "sources": len(store.list_sources()), @@ -204,6 +214,14 @@ def doctor( report.findings.append(Finding( "error", "missing_config", "config.yaml is missing", )) + else: + stored = _stored_schema_version(store) + if stored is not None and stored != VOUCH_SCHEMA_VERSION: + report.findings.append(Finding( + "error", "schema_version_mismatch", + f"KB schema_version {stored!r} does not match installed vouch " + f"{VOUCH_SCHEMA_VERSION!r} — run `vouch migrate`", + )) # Index presence (warning only — the index is derivable). if not (store.kb_dir / index_db.DB_FILENAME).exists(): diff --git a/src/vouch/jsonl_server.py b/src/vouch/jsonl_server.py index 556a6d1..62f2d70 100644 --- a/src/vouch/jsonl_server.py +++ b/src/vouch/jsonl_server.py @@ -49,6 +49,7 @@ ArtifactNotFoundError, KBNotFoundError, KBStore, + SchemaMismatchError, discover_root, ) @@ -65,6 +66,8 @@ def _store() -> KBStore: return KBStore(discover_root()) except KBNotFoundError as e: raise RuntimeError(str(e)) from e + except SchemaMismatchError as e: + raise RuntimeError(str(e)) from e def _agent() -> str: diff --git a/src/vouch/migration.py b/src/vouch/migration.py new file mode 100644 index 0000000..2c0e12d --- /dev/null +++ b/src/vouch/migration.py @@ -0,0 +1,328 @@ +"""On-disk format migration for vouch KBs. + +Each `Migration` covers exactly one version step (from_version → to_version) +and carries a list of `Transform` callables. A `Transform` receives a raw +YAML-parsed dict and the artifact subdirectory name and returns the +(possibly mutated) dict. Transforms operate on raw dicts before Pydantic +deserialisation so they remain correct even after old model classes are +removed. + +`migrate_kb()` is the public entry point: + + result = migrate_kb(project_root, from_v="0.1", to_v="0.2", dry_run=False) + +It chains all intermediate steps, writes migrated files to a temp directory, +validates every file under the target Pydantic models, then atomically +replaces the live `.vouch/` directory. Any failure leaves the original +untouched. +""" + +from __future__ import annotations + +import shutil +from collections.abc import Callable +from dataclasses import dataclass, field +from pathlib import Path +from typing import Any + +from . import audit as _audit +from .models import VOUCH_SCHEMA_VERSION +from .storage import CONFIG_FILENAME, KB_DIRNAME, _yaml_dump, _yaml_load + +Transform = Callable[[dict[str, Any], str], dict[str, Any]] + +# Subdirectories whose YAML files are migrated by field-level transforms. +# Pages (.md) are included via a separate branch because they carry YAML +# frontmatter rather than being pure YAML files. +MIGRATABLE_YAML_SUBDIRS = ( + "claims", "sources", "entities", "relations", + "evidence", "sessions", "decided", +) + + +@dataclass +class Migration: + from_version: str + to_version: str + transforms: list[Transform] = field(default_factory=list) + + +@dataclass +class MigrateResult: + from_version: str + to_version: str + changed: list[str] + skipped: list[str] + dry_run: bool + + +# Registry of all known version steps, in order. Extend this list whenever +# the on-disk format changes. A multi-hop upgrade (0.1 → 0.2 → 0.3) is +# supported by chaining consecutive Migration entries. +MIGRATIONS: list[Migration] = [ + # No transforms needed for the 0.1 baseline — this entry exists so + # migrate_kb can find a valid path when from_v == to_v == "0.1". +] + + +# --------------------------------------------------------------------------- +# Internal helpers +# --------------------------------------------------------------------------- + + +def _chain(from_v: str, to_v: str) -> list[Migration]: + """Return the ordered list of Migration steps from from_v to to_v. + + Raises ValueError if no contiguous path exists in MIGRATIONS. + """ + if from_v == to_v: + return [] + steps: list[Migration] = [] + current = from_v + # Build a quick lookup: from_version → Migration + by_from = {m.from_version: m for m in MIGRATIONS} + while current != to_v: + step = by_from.get(current) + if step is None: + raise ValueError( + f"No migration path from {current!r} to {to_v!r}. " + "Check that your vouch installation is up to date." + ) + steps.append(step) + current = step.to_version + return steps + + +def _apply_transforms(raw: dict[str, Any], subdir: str, steps: list[Migration]) -> dict[str, Any]: + for step in steps: + for transform in step.transforms: + raw = transform(raw, subdir) + return raw + + +def _read_page_frontmatter(text: str) -> tuple[dict[str, Any], str]: + """Split a page .md file into (frontmatter_dict, body).""" + import re + m = re.match(r"^---\n(.*?)\n---\n?(.*)", text, re.DOTALL) + if not m: + return {}, text + return _yaml_load(m.group(1)) or {}, m.group(2) + + +def _write_page_frontmatter(meta: dict[str, Any], body: str) -> str: + return f"---\n{_yaml_dump(meta)}---\n{body}" + + +def _validate_migrated(tmp_dir: Path) -> list[str]: + """Load every migrated artifact under target Pydantic models. Returns error list.""" + from .bundle import VALIDATORS + from .storage import _deserialize_page + + errors: list[str] = [] + for sub in MIGRATABLE_YAML_SUBDIRS: + subdir = tmp_dir / sub + if not subdir.is_dir(): + continue + for p in subdir.rglob("*"): + if not p.is_file(): + continue + if p.suffix not in (".yaml", ".yml"): + # source content files — not YAML models + continue + if sub == "sources" and p.name != "meta.yaml": + continue + validator = VALIDATORS.get(sub) + if validator is None: + continue + try: + validator(p.read_bytes()) + except Exception as e: + errors.append(f"{p.relative_to(tmp_dir)}: {e}") + + pages_dir = tmp_dir / "pages" + if pages_dir.is_dir(): + for p in pages_dir.glob("*.md"): + try: + _deserialize_page(p.read_text()) + except Exception as e: + errors.append(f"pages/{p.name}: {e}") + + return errors + + +def _atomic_swap(kb_dir: Path, tmp_dir: Path) -> None: + """Replace kb_dir with tmp_dir atomically. + + Renames kb_dir → kb_dir/../.vouch-pre-migrate, then tmp_dir → kb_dir. + On POSIX this pair of renames is as atomic as the filesystem allows. + """ + pre = kb_dir.parent / ".vouch-pre-migrate" + if pre.exists(): + shutil.rmtree(pre) + kb_dir.rename(pre) + try: + tmp_dir.rename(kb_dir) + except Exception: + # Best-effort rollback of the first rename. + pre.rename(kb_dir) + raise + shutil.rmtree(pre, ignore_errors=True) + + +def _copy_tree_for_migration(kb_dir: Path, tmp_dir: Path, steps: list[Migration]) -> list[str]: + """Copy and transform every migratable file from kb_dir into tmp_dir.""" + changed: list[str] = [] + + for sub in MIGRATABLE_YAML_SUBDIRS: + src_sub = kb_dir / sub + dst_sub = tmp_dir / sub + if not src_sub.is_dir(): + continue + dst_sub.mkdir(parents=True, exist_ok=True) + for p in sorted(src_sub.rglob("*")): + if not p.is_file(): + continue + rel = p.relative_to(src_sub) + dst = dst_sub / rel + dst.parent.mkdir(parents=True, exist_ok=True) + if p.suffix in (".yaml", ".yml") and not ( + sub == "sources" and p.name != "meta.yaml" + ): + raw = _yaml_load(p.read_text()) or {} + baseline_text = _yaml_dump(raw) + transformed = _apply_transforms(dict(raw), sub, steps) + new_text = _yaml_dump(transformed) + dst.write_text(new_text) + if new_text != baseline_text: + changed.append(str(p.relative_to(kb_dir))) + else: + dst.write_bytes(p.read_bytes()) + + # Pages (frontmatter + body) + pages_src = kb_dir / "pages" + pages_dst = tmp_dir / "pages" + if pages_src.is_dir(): + pages_dst.mkdir(parents=True, exist_ok=True) + for p in sorted(pages_src.glob("*.md")): + meta, body = _read_page_frontmatter(p.read_text()) + baseline_text = _write_page_frontmatter(meta, body) + transformed_meta = _apply_transforms(dict(meta), "pages", steps) + new_text = _write_page_frontmatter(transformed_meta, body) + (pages_dst / p.name).write_text(new_text) + if new_text != baseline_text: + changed.append(f"pages/{p.name}") + + # config.yaml — bump schema_version + cfg_src = kb_dir / CONFIG_FILENAME + if cfg_src.exists(): + cfg = _yaml_load(cfg_src.read_text()) or {} + cfg["schema_version"] = steps[-1].to_version if steps else VOUCH_SCHEMA_VERSION + (tmp_dir / CONFIG_FILENAME).write_text(_yaml_dump(cfg)) + + return changed + + +# --------------------------------------------------------------------------- +# Public API +# --------------------------------------------------------------------------- + + +def migrate_kb( + root: Path, + from_v: str, + to_v: str, + *, + dry_run: bool = False, +) -> MigrateResult: + """Migrate the KB at `root` from `from_v` to `to_v`. + + All-or-nothing: writes to a temp dir, validates, then atomically swaps. + Rolls back and raises on any failure. + + In dry_run mode, reports what would change without writing anything. + """ + kb_dir = root / KB_DIRNAME + if not kb_dir.is_dir(): + raise FileNotFoundError(f"No KB directory found at {kb_dir}") + + steps = _chain(from_v, to_v) + if not steps: + return MigrateResult( + from_version=from_v, to_version=to_v, + changed=[], skipped=[], dry_run=dry_run, + ) + + # tmp_dir must be a SIBLING of kb_dir, not a child. If it were inside + # kb_dir, the first rename in _atomic_swap (kb_dir → pre) would make + # tmp_dir's path vanish before the second rename can move it into place. + tmp_dir = kb_dir.parent / ".vouch-migrate-tmp" + if tmp_dir.exists(): + shutil.rmtree(tmp_dir) + tmp_dir.mkdir() + + try: + changed = _copy_tree_for_migration(kb_dir, tmp_dir, steps) + errors = _validate_migrated(tmp_dir) + if errors: + _audit.log_event( + kb_dir, event="migration.rollback", actor="vouch-migrate", + reversible=False, dry_run=dry_run, + data={"from_version": from_v, "to_version": to_v, "errors": errors}, + ) + shutil.rmtree(tmp_dir, ignore_errors=True) + raise RuntimeError( + f"Migration validation failed ({len(errors)} error(s)):\n" + + "\n".join(f" {e}" for e in errors) + ) + + if dry_run: + shutil.rmtree(tmp_dir, ignore_errors=True) + return MigrateResult( + from_version=from_v, to_version=to_v, + changed=changed, skipped=[], dry_run=True, + ) + + _audit.log_event( + kb_dir, event="migration.start", actor="vouch-migrate", + reversible=True, + data={"from_version": from_v, "to_version": to_v, "files": len(changed)}, + ) + _atomic_swap(kb_dir, tmp_dir) + # kb_dir now points at the new layout; log to it. + _audit.log_event( + kb_dir, event="migration.complete", actor="vouch-migrate", + reversible=False, + data={"from_version": from_v, "to_version": to_v, "files": len(changed)}, + ) + + except Exception: + shutil.rmtree(tmp_dir, ignore_errors=True) + raise + + return MigrateResult( + from_version=from_v, to_version=to_v, + changed=changed, skipped=[], dry_run=False, + ) + + +# --------------------------------------------------------------------------- +# Built-in transform factories (used when adding entries to MIGRATIONS) +# --------------------------------------------------------------------------- + + +def rename_field(subdir: str, *, old: str, new: str) -> Transform: + """Return a Transform that renames a top-level field in artifacts of `subdir`.""" + def _transform(raw: dict[str, Any], artifact_subdir: str) -> dict[str, Any]: + if artifact_subdir == subdir and old in raw: + raw[new] = raw.pop(old) + return raw + return _transform + + +def add_default(subdir: str, *, field_name: str, default: Any) -> Transform: + """Return a Transform that adds `field_name` with `default` if absent.""" + def _transform(raw: dict[str, Any], artifact_subdir: str) -> dict[str, Any]: + if artifact_subdir == subdir and field_name not in raw: + raw[field_name] = default() if callable(default) else default + return raw + return _transform diff --git a/src/vouch/models.py b/src/vouch/models.py index 1b340d8..776df57 100644 --- a/src/vouch/models.py +++ b/src/vouch/models.py @@ -14,6 +14,8 @@ from pydantic import BaseModel, Field, field_validator +VOUCH_SCHEMA_VERSION = "0.1" + def utcnow() -> datetime: return datetime.now(UTC) diff --git a/src/vouch/server.py b/src/vouch/server.py index 8672e25..fa187f3 100644 --- a/src/vouch/server.py +++ b/src/vouch/server.py @@ -40,6 +40,7 @@ ArtifactNotFoundError, KBNotFoundError, KBStore, + SchemaMismatchError, discover_root, ) @@ -53,6 +54,8 @@ def _store() -> KBStore: raise RuntimeError( f"{e}. Run `vouch init` in the project root before starting the server." ) from e + except SchemaMismatchError as e: + raise RuntimeError(f"{e}. Run `vouch migrate` to upgrade the on-disk layout.") from e def _agent() -> str: diff --git a/src/vouch/storage.py b/src/vouch/storage.py index 2e21333..e80b568 100644 --- a/src/vouch/storage.py +++ b/src/vouch/storage.py @@ -35,6 +35,7 @@ import yaml from .models import ( + VOUCH_SCHEMA_VERSION, Claim, Entity, Evidence, @@ -66,8 +67,18 @@ class ArtifactNotFoundError(KeyError): pass +class SchemaMismatchError(RuntimeError): + """Stored KB schema_version does not match the installed vouch. + + Run `vouch migrate` to upgrade the on-disk layout. + """ + + def _starter_config() -> dict[str, Any]: return { + "version": 1, + "schema_version": VOUCH_SCHEMA_VERSION, + "review": {"require_human_approval": True}, "version": KB_FORMAT_VERSION, "review": { "require_human_approval": True, @@ -158,6 +169,24 @@ class KBStore: def __init__(self, root: Path): self.root = root.resolve() self.kb_dir = self.root / KB_DIRNAME + if self.config_path.exists(): + self.assert_schema_ok() + + def assert_schema_ok(self) -> None: + """Raise SchemaMismatchError when stored schema_version != installed.""" + if not self.config_path.exists(): + return + cfg = _yaml_load(self.config_path.read_text()) or {} + stored = cfg.get("schema_version") + if stored is None: + # Pre-VEP-0004 KB — written before versioning existed. Treat as + # current version; the schema hasn't changed since 0.1 was cut. + return + if str(stored) != VOUCH_SCHEMA_VERSION: + raise SchemaMismatchError( + f"KB schema_version {stored!r} does not match installed vouch " + f"{VOUCH_SCHEMA_VERSION!r}. Run `vouch migrate` to upgrade." + ) def read_under_root(self, path: str | Path) -> tuple[Path, bytes]: # Guard against arbitrary-file-read primitives exposed by the MCP / diff --git a/tests/test_bundle.py b/tests/test_bundle.py index 51ae5d3..911cab2 100644 --- a/tests/test_bundle.py +++ b/tests/test_bundle.py @@ -411,6 +411,35 @@ def test_import_check_passes_when_member_matches_manifest( assert not any("hash mismatch" in i for i in diff.issues), diff.issues +# --- schema_version in manifest ------------------------------------------- + + +def test_export_includes_schema_version(store: KBStore, tmp_path: Path) -> None: + from vouch.models import VOUCH_SCHEMA_VERSION + bundle_path = tmp_path / "out.tar.gz" + bundle.export(store.kb_dir, dest=bundle_path) + with tarfile.open(bundle_path, "r:gz") as tar: + manifest = json.loads(tar.extractfile(bundle.MANIFEST_NAME).read().decode()) # type: ignore[union-attr] + assert manifest.get("schema_version") == VOUCH_SCHEMA_VERSION + + +def test_import_check_rejects_future_schema_version(store: KBStore, tmp_path: Path) -> None: + """Bundles created by a newer vouch should be rejected cleanly.""" + import hashlib as _hl + bundle_path = tmp_path / "future.tar.gz" + payload = b"id: c1\n" + manifest = { + "spec": bundle.SPEC_VERSION, + "schema_version": "99.0", + "bundle_id": "abc", + "files": [ + {"path": "claims/c1.yaml", "size": len(payload), + "sha256": _hl.sha256(payload).hexdigest()} + ], + "counts": {}, + "safety": {}, + } + with tarfile.open(bundle_path, "w:gz") as tar: # --- graph integrity on the bundle path ---------------------------------- # # `import_apply` writes member bytes directly to disk and never goes @@ -452,6 +481,31 @@ def _write_multi_member_bundle( mf_info = tarfile.TarInfo(bundle.MANIFEST_NAME) mf_info.size = len(mf_bytes) tar.addfile(mf_info, io.BytesIO(mf_bytes)) + info = tarfile.TarInfo("claims/c1.yaml") + info.size = len(payload) + tar.addfile(info, io.BytesIO(payload)) + + result = bundle.import_check(store.kb_dir, bundle_path) + assert not result.ok + assert any("99.0" in i and "newer" in i for i in result.issues) + + +def test_import_check_accepts_bundle_without_schema_version(store: KBStore, tmp_path: Path) -> None: + """Pre-VEP-0004 bundles (no schema_version field) must still import.""" + import hashlib as _hl + bundle_path = tmp_path / "old.tar.gz" + payload = b"id: c1\n" + manifest = { + "spec": bundle.SPEC_VERSION, + "bundle_id": "abc", + "files": [ + {"path": "claims/c1.yaml", "size": len(payload), + "sha256": _hl.sha256(payload).hexdigest()} + ], + "counts": {}, + "safety": {}, + } + with tarfile.open(bundle_path, "w:gz") as tar: # --- source content-addressing on the bundle path ------------------------ @@ -513,6 +567,14 @@ def _write_source_bundle( mf_info = tarfile.TarInfo(bundle.MANIFEST_NAME) mf_info.size = len(mf_bytes) tar.addfile(mf_info, io.BytesIO(mf_bytes)) + info = tarfile.TarInfo("claims/c1.yaml") + info.size = len(payload) + tar.addfile(info, io.BytesIO(payload)) + + result = bundle.import_check(store.kb_dir, bundle_path) + assert not any("schema_version" in i or "newer" in i for i in result.issues), ( + f"pre-VEP-0004 bundle rejected for version reasons: {result.issues}" + ) def _relation_yaml(rid: str, source: str, target: str, evidence: list[str]) -> bytes: diff --git a/tests/test_migration.py b/tests/test_migration.py new file mode 100644 index 0000000..0e7a706 --- /dev/null +++ b/tests/test_migration.py @@ -0,0 +1,275 @@ +"""Migration engine tests — round-trip, rollback, dry-run, transform factories.""" + +from __future__ import annotations + +from pathlib import Path + +import pytest + +from vouch.migration import ( + MigrateResult, + Migration, + _chain, + add_default, + migrate_kb, + rename_field, +) +from vouch.models import VOUCH_SCHEMA_VERSION +from vouch.storage import CONFIG_FILENAME, KB_DIRNAME, KBStore, _yaml_dump, _yaml_load + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +def _write_config(kb_dir: Path, schema_version: str) -> None: + cfg_path = kb_dir / CONFIG_FILENAME + cfg = _yaml_load(cfg_path.read_text()) if cfg_path.exists() else {} + cfg["schema_version"] = schema_version + cfg_path.write_text(_yaml_dump(cfg)) + + +def _seed_kb_with_claim( + tmp_path: Path, + claim_id: str = "test-claim", + **claim_extra, +) -> tuple[Path, str]: + store = KBStore.init(tmp_path) + src = store.put_source(b"evidence") + raw = _claim_yaml(claim_id, evidence=[src.id], **claim_extra) + (store.kb_dir / "claims" / f"{claim_id}.yaml").write_text(_yaml_dump(raw)) + return store.kb_dir, src.id + + +def _claim_yaml(claim_id: str, **extra) -> dict: + base = { + "id": claim_id, + "text": "A test claim.", + "type": "observation", + "status": "working", + "confidence": 0.7, + "evidence": [], + "entities": [], + "supersedes": [], + "superseded_by": None, + "contradicts": [], + "scope": "project", + "tags": [], + "created_at": "2024-01-01T00:00:00+00:00", + "updated_at": "2024-01-01T00:00:00+00:00", + "last_confirmed_at": None, + "approved_by": None, + } + base.update(extra) + return base + + +# --------------------------------------------------------------------------- +# _chain() resolution +# --------------------------------------------------------------------------- + + +def test_chain_same_version_returns_empty(): + assert _chain("0.1", "0.1") == [] + + +def test_chain_unknown_version_raises(): + with pytest.raises(ValueError, match="Check that your vouch installation is up to date"): + _chain("0.1", "99.0") + + +# --------------------------------------------------------------------------- +# No-op migration (from == to) +# --------------------------------------------------------------------------- + + +def test_migrate_no_op(tmp_path: Path): + KBStore.init(tmp_path) + result = migrate_kb(tmp_path, from_v=VOUCH_SCHEMA_VERSION, to_v=VOUCH_SCHEMA_VERSION) + assert isinstance(result, MigrateResult) + assert result.changed == [] + assert result.dry_run is False + + +# --------------------------------------------------------------------------- +# rename_field transform factory +# --------------------------------------------------------------------------- + + +def test_rename_field_transform(): + from vouch.migration import rename_field + t = rename_field("claims", old="old_name", new="new_name") + raw = {"old_name": "value", "other": 1} + result = t(raw, "claims") + assert "new_name" in result + assert "old_name" not in result + assert result["other"] == 1 + + +def test_rename_field_wrong_subdir_is_noop(): + t = rename_field("claims", old="old_name", new="new_name") + raw = {"old_name": "value"} + result = t(raw, "entities") + assert "old_name" in result + assert "new_name" not in result + + +# --------------------------------------------------------------------------- +# add_default transform factory +# --------------------------------------------------------------------------- + + +def test_add_default_adds_missing(): + t = add_default("claims", field_name="new_field", default="hello") + raw = {"id": "x"} + result = t(raw, "claims") + assert result["new_field"] == "hello" + + +def test_add_default_skips_existing(): + t = add_default("claims", field_name="new_field", default="hello") + raw = {"id": "x", "new_field": "existing"} + result = t(raw, "claims") + assert result["new_field"] == "existing" + + +def test_add_default_callable(): + t = add_default("claims", field_name="tags", default=list) + raw = {"id": "x"} + result = t(raw, "claims") + assert result["tags"] == [] + + +# --------------------------------------------------------------------------- +# Round-trip with a real Migration in MIGRATIONS +# --------------------------------------------------------------------------- + + +@pytest.fixture +def patched_migrations(monkeypatch): + """Register a fake 0.1→0.2 migration for the duration of a test.""" + fake_migration = Migration( + from_version="0.1", + to_version="0.2", + transforms=[ + rename_field("claims", old="old_field", new="new_field"), + add_default("claims", field_name="added_field", default="default_val"), + ], + ) + monkeypatch.setattr("vouch.migration.MIGRATIONS", [fake_migration]) + return fake_migration + + +def test_migrate_round_trip(tmp_path: Path, patched_migrations): + kb_dir, _ = _seed_kb_with_claim(tmp_path, old_field="old_value") + _write_config(kb_dir, "0.1") + + result = migrate_kb(tmp_path, from_v="0.1", to_v="0.2") + + assert "claims/test-claim.yaml" in result.changed + migrated = _yaml_load((kb_dir / "claims" / "test-claim.yaml").read_text()) + assert "new_field" in migrated + assert migrated["new_field"] == "old_value" + assert "old_field" not in migrated + assert migrated["added_field"] == "default_val" + + # config.yaml should be bumped + cfg = _yaml_load((kb_dir / CONFIG_FILENAME).read_text()) + assert cfg["schema_version"] == "0.2" + + +def test_migrate_no_transform_skips_yaml_roundtrip_whitespace(tmp_path: Path, monkeypatch): + """No-op transforms must not flag files whose on-disk YAML differs from _yaml_dump.""" + noop_migration = Migration(from_version="0.1", to_version="0.2", transforms=[]) + monkeypatch.setattr("vouch.migration.MIGRATIONS", [noop_migration]) + + kb_dir, src_id = _seed_kb_with_claim(tmp_path, "messy-claim", text="hello") + + messy_yaml = f"""id: messy-claim +text: hello +type: observation +status: working +confidence: 0.7 +evidence: +- {src_id} +entities: [] +supersedes: [] +superseded_by: null +contradicts: [] +scope: project +tags: [] +created_at: '2024-01-01T00:00:00+00:00' +updated_at: '2024-01-01T00:00:00+00:00' +last_confirmed_at: null +approved_by: null +""" + (kb_dir / "claims" / "messy-claim.yaml").write_text(messy_yaml) + _write_config(kb_dir, "0.1") + + result = migrate_kb(tmp_path, from_v="0.1", to_v="0.2", dry_run=True) + + assert "claims/messy-claim.yaml" not in result.changed + + +def test_migrate_dry_run_does_not_write(tmp_path: Path, patched_migrations): + kb_dir, _ = _seed_kb_with_claim(tmp_path, old_field="old_value") + _write_config(kb_dir, "0.1") + + result = migrate_kb(tmp_path, from_v="0.1", to_v="0.2", dry_run=True) + + assert result.dry_run is True + assert result.changed # reported as changed + # But the file on disk is untouched + on_disk = _yaml_load((kb_dir / "claims" / "test-claim.yaml").read_text()) + assert "old_field" in on_disk + assert "new_field" not in on_disk + # And no tmp dir left behind + assert not (tmp_path / ".vouch-migrate-tmp").exists() + + +# --------------------------------------------------------------------------- +# Rollback on validation failure +# --------------------------------------------------------------------------- + + +def test_migrate_rollback_on_validation_error(tmp_path: Path, monkeypatch): + """When the transform produces an invalid artifact, the original is preserved.""" + from vouch.migration import Migration + + def _bad_transform(raw, subdir): + # Delete a required field to break Pydantic validation + raw.pop("id", None) + return raw + + bad_migration = Migration( + from_version="0.1", + to_version="0.2", + transforms=[_bad_transform], + ) + monkeypatch.setattr("vouch.migration.MIGRATIONS", [bad_migration]) + + kb_dir, _ = _seed_kb_with_claim(tmp_path) + original_text = (kb_dir / "claims" / "test-claim.yaml").read_text() + _write_config(kb_dir, "0.1") + + with pytest.raises(RuntimeError, match="validation failed"): + migrate_kb(tmp_path, from_v="0.1", to_v="0.2") + + # Original must be untouched + on_disk = (kb_dir / "claims" / "test-claim.yaml").read_text() + assert on_disk == original_text + # No tmp dir left behind + assert not (kb_dir.parent / ".vouch-migrate-tmp").exists() + # config still at old version + cfg = _yaml_load((kb_dir / CONFIG_FILENAME).read_text()) + assert cfg.get("schema_version") == "0.1" + + +# --------------------------------------------------------------------------- +# migrate_kb raises on non-existent KB +# --------------------------------------------------------------------------- + + +def test_migrate_no_kb_raises(tmp_path: Path): + with pytest.raises(FileNotFoundError): + migrate_kb(tmp_path, from_v="0.1", to_v="0.2") diff --git a/tests/test_storage.py b/tests/test_storage.py index 53640b7..6a67c6b 100644 --- a/tests/test_storage.py +++ b/tests/test_storage.py @@ -626,3 +626,43 @@ def test_cite_resolves_source_and_evidence(store: KBStore) -> None: for c in citations } assert "source" in kinds and "evidence" in kinds + + +# --- schema versioning + SchemaMismatchError guard ------------------------ + + +def test_init_writes_schema_version(tmp_path: Path) -> None: + from vouch.models import VOUCH_SCHEMA_VERSION + from vouch.storage import CONFIG_FILENAME, KB_DIRNAME, _yaml_load + KBStore.init(tmp_path) + cfg = _yaml_load((tmp_path / KB_DIRNAME / CONFIG_FILENAME).read_text()) + assert cfg.get("schema_version") == VOUCH_SCHEMA_VERSION + + +def test_schema_version_mismatch_raises(tmp_path: Path) -> None: + from vouch.storage import ( + CONFIG_FILENAME, + KB_DIRNAME, + SchemaMismatchError, + _yaml_dump, + _yaml_load, + ) + KBStore.init(tmp_path) + cfg_path = tmp_path / KB_DIRNAME / CONFIG_FILENAME + cfg = _yaml_load(cfg_path.read_text()) + cfg["schema_version"] = "99.99" + cfg_path.write_text(_yaml_dump(cfg)) + with pytest.raises(SchemaMismatchError, match=r"99\.99"): + KBStore(tmp_path) + + +def test_missing_schema_version_is_allowed(tmp_path: Path) -> None: + from vouch.storage import CONFIG_FILENAME, KB_DIRNAME, _yaml_dump, _yaml_load + KBStore.init(tmp_path) + cfg_path = tmp_path / KB_DIRNAME / CONFIG_FILENAME + cfg = _yaml_load(cfg_path.read_text()) + del cfg["schema_version"] + cfg_path.write_text(_yaml_dump(cfg)) + # Should not raise — pre-VEP-0004 KBs have no schema_version + store2 = KBStore(tmp_path) + assert store2 is not None