diff --git a/.agents/skills/filigree-workflow/SKILL.md b/.agents/skills/filigree-workflow/SKILL.md index 1c2414df..d3cb24e2 100644 --- a/.agents/skills/filigree-workflow/SKILL.md +++ b/.agents/skills/filigree-workflow/SKILL.md @@ -271,11 +271,11 @@ are for "hmm, this might be worth looking at" — the uncertain middle ground. Observations expire after 14 days. Triage them before they rot: -1. **At session end:** run `list_observations` and quickly scan what's accumulated +1. **At session end:** run `observation_list` and quickly scan what's accumulated 2. **For each observation, decide:** - **Dismiss** — not actionable, already fixed, or not worth tracking. Use - `dismiss_observation` with a brief reason for the audit trail. - - **Promote** — deserves to be tracked as an issue. Use `promote_observation` + `observation_dismiss` with a brief reason for the audit trail. + - **Promote** — deserves to be tracked as an issue. Use `observation_promote` which atomically creates an issue and labels it `from-observation`. Choose the right issue type: - `type='bug'` — something is broken or produces wrong results @@ -285,7 +285,7 @@ Observations expire after 14 days. Triage them before they rot: - **Leave it** — still uncertain. Let it age. If it survives a few sessions without being promoted, it's probably a dismiss. -3. **Batch cleanup:** use the MCP tool `batch_dismiss_observations` when several observations +3. **Batch cleanup:** use the MCP tool `observation_batch_dismiss` when several observations have gone stale together. ### Promote vs Dismiss @@ -319,6 +319,6 @@ filigree search "from-observation" # Search with context | "This task is bigger than expected" | Create sub-tasks, add deps | | "I'm done" | Comment, close with reason, check `ready` | | "Something changed while I worked" | `filigree changes --since ` | -| "I noticed something odd in a file I'm passing through" | `observe` with file_path and line — keep working | +| "I noticed something odd in a file I'm passing through" | `observation_create` with file_path and line — keep working | | "I noticed a gap in the work I'm currently doing" | Fix it, expand the task, or file a proper issue — **do not** observe it | -| "These observations are piling up" | `list_observations`, then dismiss or promote each | +| "These observations are piling up" | `observation_list`, then dismiss or promote each | diff --git a/.clarion/.gitignore b/.clarion/.gitignore index 4b6fcf94..2944180f 100644 --- a/.clarion/.gitignore +++ b/.clarion/.gitignore @@ -7,6 +7,7 @@ # change on every analyze run, so they are NOT tracked (untracked 2026-06-02). clarion.db instance_id +clarion.lock # SQLite write-ahead files never belong in the repo. *-wal diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index e41f5be5..e35daed8 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -27,10 +27,10 @@ jobs: name: Build + deploy runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd with: fetch-depth: 0 - - uses: actions/setup-python@v5 + - uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 with: python-version: "3.13" cache: pip diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index 6bae6f17..af6d4e5a 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -67,9 +67,29 @@ jobs: python scripts/check-migration-retirement.py --self-test python scripts/check-migration-retirement.py + - name: release governance static guard + run: | + python scripts/check-github-release-governance.py --self-test + python scripts/check-github-release-governance.py --static-only + - name: cross-workspace version lockstep run: python scripts/check-workspace-version-lockstep.py + - name: pyright pin lockstep + run: | + python scripts/check-pyright-pin-lockstep.py --self-test + python scripts/check-pyright-pin-lockstep.py + + - name: wardline version bounds + run: | + python scripts/check-wardline-version-bounds.py --self-test + python scripts/check-wardline-version-bounds.py + + - name: entity-cap ADR/code lockstep + run: | + python scripts/check-entity-cap-lockstep.py --self-test + python scripts/check-entity-cap-lockstep.py + - name: rust clippy run: cargo clippy --workspace --all-targets --all-features -- -D warnings @@ -133,20 +153,20 @@ jobs: release-governance: name: GitHub release governance - # Enforcement temporarily neutered: the repo's baseline governance is being - # rebaselined and release-control enforcement is moving to a separate app. - # The GOV-01/GOV-02 rulesets this job asserted are intentionally relaxed, so - # the live-ruleset check is replaced with a no-op notice. The job and its - # place in the build-rust / build-plugin `needs:` chain are retained so the - # static guard (check-github-release-governance.py --static-only) stays - # satisfied. Restore the enforcement step below — and the rulesets — when the - # new governance baseline lands (clarion-5d0bf8b51e). runs-on: ubuntu-latest + permissions: + contents: read steps: - - name: release governance (temporarily disabled) + - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd + + - name: enforce repository release controls + env: + GH_TOKEN: ${{ secrets.RELEASE_GOVERNANCE_TOKEN }} run: | - echo "release-governance enforcement is temporarily disabled (clarion-5d0bf8b51e)." - echo "GOV-01/GOV-02 rulesets are intentionally relaxed during governance rebaselining." + set -euo pipefail + python scripts/check-github-release-governance.py \ + --repository "${GITHUB_REPOSITORY}" \ + --branch main build-rust: needs: [verify, release-governance] diff --git a/AGENTS.md b/AGENTS.md index 6ff15f22..7a27f0d3 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -231,7 +231,7 @@ When you are unsure, say what evidence is missing and go get it if it is cheap. When evidence is expensive or requires credentials, explain the limitation and the next command a maintainer can run. - + ## Filigree Issue Tracker `filigree` tracks tasks for this project. Data lives in `.filigree/`. Prefer @@ -253,29 +253,29 @@ filigree start-work --assignee filigree close ``` -Use the atomic claim+transition verbs — `start_work` / `start_next_work` +Use the atomic claim+transition verbs — `work_start` / `work_start_next` (MCP) or `start-work` / `start-next-work` (CLI). Do **not** chain -`claim_issue` (MCP) or `filigree claim` (CLI) with a subsequent status +`work_claim` (MCP) or `filigree claim` (CLI) with a subsequent status update — the two-step form races against other agents; the combined verb is atomic. **Ready ≠ startable.** The working status is type-specific (tasks → `in_progress`, features → `building`). Bugs start at `triage`, which has no single-hop transition into work (`triage → confirmed → fixing`), so a triage -bug is *ready* but not directly *startable*: `start_work` on one returns -`INVALID_TRANSITION` naming the next status, and `start_next_work` skips it. -`get_ready` items carry a `startable` flag (plus a `next_action` hint when +bug is *ready* but not directly *startable*: `work_start` on one returns +`INVALID_TRANSITION` naming the next status, and `work_start_next` skips it. +`work_ready` items carry a `startable` flag (plus a `next_action` hint when false). Pass `advance=true` (MCP) / `--advance` (CLI) to walk the soft transitions to the nearest working status automatically. ### Observations: when (and when not) to use them -`observe` is a fire-and-forget scratchpad for *incidental* defects — things +`observation_create` is a fire-and-forget scratchpad for *incidental* defects — things you notice *outside the scope of your current task* (a code smell in a neighbouring file, a stale TODO, a missing test for an edge case you happened to spot). Notes expire after 14 days unless promoted. Include `file_path` and -`line` when relevant. At session end, skim `list_observations` and either -`dismiss_observation` or `promote_observation` for what has accumulated. +`line` when relevant. At session end, skim `observation_list` and either +`observation_dismiss` or `observation_promote` for what has accumulated. **You fix bugs in your currently defined scope. You do NOT use observations to finish work prematurely.** If a defect, gap, or follow-up belongs to your @@ -300,27 +300,27 @@ MCP tool schemas describe each tool; `filigree --help` and `filigree --help` are the authoritative CLI reference. You do not need to memorise either catalogue. The verbs you will reach for most: -- **Find work:** `get_ready`, `get_blocked`, `list_issues`, `search_issues` -- **Claim work:** `start_work`, `start_next_work` -- **Update:** `add_comment`, `add_label`, `update_issue`, `close_issue` -- **Admin (irreversible):** `delete_issue` (MCP) / `delete-issue` (CLI) — - hard-deletes a terminal issue and its rows; `undo_last` cannot reverse it. -- **Scratchpad:** `observe`, `list_observations`, `promote_observation`, `dismiss_observation` -- **Cross-product entity bindings (ADR-029):** `add_entity_association`, - `remove_entity_association`, `list_entity_associations`, - `list_associations_by_entity`. Used when a sibling tool (e.g. +- **Find work:** `work_ready`, `work_blocked`, `issue_list`, `issue_search` +- **Claim work:** `work_start`, `work_start_next` +- **Update:** `comment_add`, `label_add`, `issue_update`, `issue_close` +- **Admin (irreversible):** `issue_delete` (MCP) / `delete-issue` (CLI) — + hard-deletes a terminal issue and its rows; `admin_undo_last` cannot reverse it. +- **Scratchpad:** `observation_create`, `observation_list`, `observation_promote`, `observation_dismiss` +- **Cross-product entity bindings (ADR-029):** `entity_association_add`, + `entity_association_remove`, `entity_association_list`, + `entity_association_list_by_entity`. Used when a sibling tool (e.g. Clarion) needs to bind a Filigree issue to a function, class, or module identifier it owns. The `entity_id` is an opaque string from Filigree's perspective; the consumer (the sibling tool's read path) does drift detection against the stored - `content_hash_at_attach`. `list_associations_by_entity` is the + `content_hash_at_attach`. `entity_association_list_by_entity` is the reverse-lookup surface — given a Clarion entity ID, return every Filigree issue bound to it (project isolation is by DB file). Also reachable over HTTP as `GET/POST /api/issue/{issue_id}/entity-associations`, `DELETE /api/issue/{issue_id}/entity-associations?entity_id=…`, and `GET /api/entity-associations?entity_id=…`. -- **Health:** `get_stats`, `get_metrics`, `get_mcp_status` +- **Health:** `stats_get`, `metrics_get`, `mcp_status_get` Pass `--actor ` (CLI) so events attribute to your agent identity. It works in either position — before the verb (`filigree --actor X update …`) or @@ -336,7 +336,7 @@ Errors return `{error: str, code: ErrorCode, details?: dict}`. Switch on `CLARION_REGISTRY_VERSION_MISMATCH`, `BRIEFING_BLOCKED`, `STOP_FAILED`, `SCHEMA_MISMATCH`, `INTERNAL`. -On `INVALID_TRANSITION`, call `get_valid_transitions` (MCP) or +On `INVALID_TRANSITION`, call `workflow_transition_list` (MCP) or `filigree transitions ` to see what the workflow allows from here. Two failure modes deserve a specific response: diff --git a/CHANGELOG.md b/CHANGELOG.md index 3dcc2862..a2c20e59 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,8 +12,44 @@ only when an incompatible change is made to that surface. See ## [Unreleased] +### Changed + +- Refreshed release-facing README/index documentation for the current 1.2.0 + release line, including the 39-tool MCP surface, current install artifact + names, fixed ADR/docset links, current web/operator quick starts, and the full + end-to-end verification list. +- Archived tracked architecture-analysis working notes out of live `temp/` + directories under `docs/archive/working-notes/`. + +## [1.2.0] — 2026-06-03 + ### Added +- **Guidance maturity — WS6 (REQ-GUIDANCE-03/-05/-06; ADR-007, ADR-024).** The + guidance system moves from schema baseline to operator-usable. + - **`clarion guidance` CLI** — `create` / `edit` (`$EDITOR`) / `show` / `list` + / `delete` / `export` / `import`. Match-rule syntax (`path:` / `tag:` / + `kind:` / `subsystem:` / `entity:`), scope-levels (project→function), and + `--expires` normalisation to a full ISO-8601 instant. Sheets are written via a + new non-run-scoped `clarion-storage` guidance API, and the rule-matcher is + lifted into `clarion-storage` as the single source of truth shared by the CLI, + `analyze`, and the MCP read path. + - **Staleness findings (`analyze`).** `CLA-FACT-GUIDANCE-ORPHAN` (WARN) now also + fires for a `match_rules {entity:…}` rule pointing at a deleted entity (was + `guides`-edge only); new `CLA-FACT-GUIDANCE-EXPIRED` (INFO) and + `CLA-FACT-GUIDANCE-CHURN-STALE` (WARN, confidence 0.7 heuristic, asymmetric + threshold 50 / 20-pinned). Surfaced via `clarion guidance list --stale` + (review-cadence age) / `--expired`. CHURN-STALE is honest-empty until + `git_churn_count` population lands. + - **Team import/export.** `export --to ` / `import ` — deterministic + one-file-per-sheet sorted-key JSON, additive idempotent import, loud-fail on + malformed input. + - **Cache invalidation.** Authoring (create / edit / delete / import) eagerly + invalidates the summary cache of matched entities (ADR-007 churn-eager + invalidation). + - Deferred with tracking issues: the agent-mediated propose→promote lifecycle + (no observation-write transport), Wardline-derived generation, the in-browser + staleness-review UI, and guidance composition into summary generation. - **Git-rename provider seam now operative — WS9 / SEI §6 (REQ-C-05).** `analyze` drives the committed rename window so the `legis` `GitRenameSource` is actually consulted, closing the window gap previously surfaced in @@ -107,6 +143,16 @@ only when an incompatible change is made to that surface. See adds no policy/attestation engine — Wardline analyses, `legis` governs, attestations key on Clarion's SEI. +### Fixed + +- **`guidance_for` no longer drops expiry-bearing sheets in production.** The MCP + read path compared a sheet's ISO `expires` lexically against the server clock, + whose production default is a `unix:` string — so every sheet carrying + any `expires` sorted as "expired" and was silently excluded from composition. + The comparison now parses both forms to seconds (fail-open on unparseable + input), guarded by a regression test that runs under the production clock + (clarion-3153e74f0b). + ## [1.1.0] — 2026-05-31 ### Added @@ -420,7 +466,8 @@ normative. - Operator guides under [`docs/operator/`](docs/operator/) — getting-started, OpenRouter setup, HTTP read API. -[Unreleased]: https://github.com/tachyon-beep/clarion/compare/v1.1.0...HEAD +[Unreleased]: https://github.com/tachyon-beep/clarion/compare/v1.2.0...HEAD +[1.2.0]: https://github.com/tachyon-beep/clarion/compare/v1.1.0...v1.2.0 [1.1.0]: https://github.com/tachyon-beep/clarion/compare/v1.0.1...v1.1.0 [1.0.1]: https://github.com/tachyon-beep/clarion/compare/v1.0.0...v1.0.1 [1.0.0]: https://github.com/tachyon-beep/clarion/releases/tag/v1.0.0 diff --git a/CLAUDE.md b/CLAUDE.md index f0ecd6cb..b84f9229 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Repository state -**v1.1.0 — current pre-release working version.** The `v1.0.0` tag (first publishable release) is cut; pre-release working tags `v0.1-sprint-1` and `v0.1-sprint-2` remain in the repo as historical anchors. Workspace + Python plugin are at 1.1.0; ADR-014 federation HTTP read API ships with bearer auth, batch resolution, briefing-blocked propagation, and stable per-project `instance_id`. See [`CHANGELOG.md`](CHANGELOG.md) for the full 1.0 scope and [`docs/implementation/`](docs/implementation/) for sprint-closure artifacts. +**v1.2.0 — current pre-release working version.** The `v1.0.0` (first publishable release), `v1.0.1`, and `v1.1.0` tags are cut; pre-release working tags `v0.1-sprint-1` and `v0.1-sprint-2` remain in the repo as historical anchors. Workspace + Python plugin are at 1.2.0; ADR-014 federation HTTP read API ships with bearer auth, batch resolution, briefing-blocked propagation, and stable per-project `instance_id`. See [`CHANGELOG.md`](CHANGELOG.md) for the full scope and [`docs/implementation/`](docs/implementation/) for sprint-closure artifacts. ### Layout (post-1.0) @@ -150,7 +150,7 @@ Open issues for the v1.0 known limitations and any post-release follow-ups live protection (timestamp + nonce window) is ADR-034 forward-work tracked for post-1.0 hardening. - + ## Filigree Issue Tracker `filigree` tracks tasks for this project. Data lives in `.filigree/`. Prefer @@ -172,29 +172,29 @@ filigree start-work --assignee filigree close ``` -Use the atomic claim+transition verbs — `start_work` / `start_next_work` +Use the atomic claim+transition verbs — `work_start` / `work_start_next` (MCP) or `start-work` / `start-next-work` (CLI). Do **not** chain -`claim_issue` (MCP) or `filigree claim` (CLI) with a subsequent status +`work_claim` (MCP) or `filigree claim` (CLI) with a subsequent status update — the two-step form races against other agents; the combined verb is atomic. **Ready ≠ startable.** The working status is type-specific (tasks → `in_progress`, features → `building`). Bugs start at `triage`, which has no single-hop transition into work (`triage → confirmed → fixing`), so a triage -bug is *ready* but not directly *startable*: `start_work` on one returns -`INVALID_TRANSITION` naming the next status, and `start_next_work` skips it. -`get_ready` items carry a `startable` flag (plus a `next_action` hint when +bug is *ready* but not directly *startable*: `work_start` on one returns +`INVALID_TRANSITION` naming the next status, and `work_start_next` skips it. +`work_ready` items carry a `startable` flag (plus a `next_action` hint when false). Pass `advance=true` (MCP) / `--advance` (CLI) to walk the soft transitions to the nearest working status automatically. ### Observations: when (and when not) to use them -`observe` is a fire-and-forget scratchpad for *incidental* defects — things +`observation_create` is a fire-and-forget scratchpad for *incidental* defects — things you notice *outside the scope of your current task* (a code smell in a neighbouring file, a stale TODO, a missing test for an edge case you happened to spot). Notes expire after 14 days unless promoted. Include `file_path` and -`line` when relevant. At session end, skim `list_observations` and either -`dismiss_observation` or `promote_observation` for what has accumulated. +`line` when relevant. At session end, skim `observation_list` and either +`observation_dismiss` or `observation_promote` for what has accumulated. **You fix bugs in your currently defined scope. You do NOT use observations to finish work prematurely.** If a defect, gap, or follow-up belongs to your @@ -219,27 +219,27 @@ MCP tool schemas describe each tool; `filigree --help` and `filigree --help` are the authoritative CLI reference. You do not need to memorise either catalogue. The verbs you will reach for most: -- **Find work:** `get_ready`, `get_blocked`, `list_issues`, `search_issues` -- **Claim work:** `start_work`, `start_next_work` -- **Update:** `add_comment`, `add_label`, `update_issue`, `close_issue` -- **Admin (irreversible):** `delete_issue` (MCP) / `delete-issue` (CLI) — - hard-deletes a terminal issue and its rows; `undo_last` cannot reverse it. -- **Scratchpad:** `observe`, `list_observations`, `promote_observation`, `dismiss_observation` -- **Cross-product entity bindings (ADR-029):** `add_entity_association`, - `remove_entity_association`, `list_entity_associations`, - `list_associations_by_entity`. Used when a sibling tool (e.g. +- **Find work:** `work_ready`, `work_blocked`, `issue_list`, `issue_search` +- **Claim work:** `work_start`, `work_start_next` +- **Update:** `comment_add`, `label_add`, `issue_update`, `issue_close` +- **Admin (irreversible):** `issue_delete` (MCP) / `delete-issue` (CLI) — + hard-deletes a terminal issue and its rows; `admin_undo_last` cannot reverse it. +- **Scratchpad:** `observation_create`, `observation_list`, `observation_promote`, `observation_dismiss` +- **Cross-product entity bindings (ADR-029):** `entity_association_add`, + `entity_association_remove`, `entity_association_list`, + `entity_association_list_by_entity`. Used when a sibling tool (e.g. Clarion) needs to bind a Filigree issue to a function, class, or module identifier it owns. The `entity_id` is an opaque string from Filigree's perspective; the consumer (the sibling tool's read path) does drift detection against the stored - `content_hash_at_attach`. `list_associations_by_entity` is the + `content_hash_at_attach`. `entity_association_list_by_entity` is the reverse-lookup surface — given a Clarion entity ID, return every Filigree issue bound to it (project isolation is by DB file). Also reachable over HTTP as `GET/POST /api/issue/{issue_id}/entity-associations`, `DELETE /api/issue/{issue_id}/entity-associations?entity_id=…`, and `GET /api/entity-associations?entity_id=…`. -- **Health:** `get_stats`, `get_metrics`, `get_mcp_status` +- **Health:** `stats_get`, `metrics_get`, `mcp_status_get` Pass `--actor ` (CLI) so events attribute to your agent identity. It works in either position — before the verb (`filigree --actor X update …`) or @@ -255,7 +255,7 @@ Errors return `{error: str, code: ErrorCode, details?: dict}`. Switch on `CLARION_REGISTRY_VERSION_MISMATCH`, `BRIEFING_BLOCKED`, `STOP_FAILED`, `SCHEMA_MISMATCH`, `INTERNAL`. -On `INVALID_TRANSITION`, call `get_valid_transitions` (MCP) or +On `INVALID_TRANSITION`, call `workflow_transition_list` (MCP) or `filigree transitions ` to see what the workflow allows from here. Two failure modes deserve a specific response: diff --git a/Cargo.lock b/Cargo.lock index ef8f5c35..8f585def 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -312,7 +312,7 @@ checksum = "c8d4a3bb8b1e0c1050499d1815f5ab16d04f0959b233085fb31653fbfc9d98f9" [[package]] name = "clarion-cli" -version = "1.1.0" +version = "1.2.0" dependencies = [ "anyhow", "assert_cmd", @@ -320,12 +320,14 @@ dependencies = [ "blake3", "clap", "clarion-core", + "clarion-federation", "clarion-mcp", "clarion-plugin-fixture", "clarion-scanner", "clarion-storage", "dotenvy", "fs2", + "hmac", "ignore", "reqwest", "rusqlite", @@ -334,6 +336,7 @@ dependencies = [ "serde_norway", "sha1", "sha2", + "subtle", "tempfile", "time", "tokio", @@ -347,25 +350,43 @@ dependencies = [ [[package]] name = "clarion-core" -version = "1.1.0" +version = "1.2.0" dependencies = [ + "async-trait", "nix", "reqwest", "serde", "serde_json", "tempfile", "thiserror 1.0.69", + "tokio", "toml", "tracing", "which", ] +[[package]] +name = "clarion-federation" +version = "1.2.0" +dependencies = [ + "clarion-core", + "clarion-storage", + "reqwest", + "serde", + "serde_json", + "serde_norway", + "tempfile", + "thiserror 1.0.69", +] + [[package]] name = "clarion-mcp" -version = "1.1.0" +version = "1.2.0" dependencies = [ + "async-trait", "blake3", "clarion-core", + "clarion-federation", "clarion-storage", "nix", "reqwest", @@ -383,7 +404,7 @@ dependencies = [ [[package]] name = "clarion-plugin-fixture" -version = "1.1.0" +version = "1.2.0" dependencies = [ "clarion-core", "nix", @@ -392,7 +413,7 @@ dependencies = [ [[package]] name = "clarion-scanner" -version = "1.1.0" +version = "1.2.0" dependencies = [ "regex", "serde", @@ -404,7 +425,7 @@ dependencies = [ [[package]] name = "clarion-storage" -version = "1.1.0" +version = "1.2.0" dependencies = [ "blake3", "clarion-core", @@ -414,6 +435,7 @@ dependencies = [ "serde_json", "tempfile", "thiserror 1.0.69", + "time", "tokio", "tracing", ] @@ -563,6 +585,7 @@ checksum = "9ed9a281f7bc9b7576e61468ba615a66a5c8cfdff42420a70aa82701a3b1e292" dependencies = [ "block-buffer", "crypto-common", + "subtle", ] [[package]] @@ -819,6 +842,15 @@ version = "0.5.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "fc0fef456e4baa96da950455cd02c081ca953b141298e41db3fc7e36b1da849c" +[[package]] +name = "hmac" +version = "0.12.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6c49c37c09c17a53d937dfbb742eb3a961d65a994e6bcdcf37e7399d0cc8ab5e" +dependencies = [ + "digest", +] + [[package]] name = "home" version = "0.5.12" diff --git a/Cargo.toml b/Cargo.toml index ae5c87c1..cd23ea42 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -2,6 +2,7 @@ resolver = "3" members = [ "crates/clarion-core", + "crates/clarion-federation", "crates/clarion-storage", "crates/clarion-cli", "crates/clarion-mcp", @@ -10,7 +11,7 @@ members = [ ] [workspace.package] -version = "1.1.0" +version = "1.2.0" edition = "2024" license = "MIT" repository = "https://github.com/tachyon-beep/clarion" @@ -32,12 +33,14 @@ missing_errors_doc = "allow" [workspace.dependencies] anyhow = "1" +async-trait = "0.1" axum = "0.7" blake3 = "1.8.5" clap = { version = "4", features = ["derive"] } deadpool-sqlite = { version = "0.8", features = ["rt_tokio_1"] } dotenvy = "0.15" fs2 = "0.4" +hmac = "0.12" ignore = "0.4" rusqlite = { version = "0.31", features = ["bundled", "backup"] } reqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls-native-roots"] } @@ -47,6 +50,7 @@ serde_json = { version = "1", features = ["raw_value"] } serde_norway = "0.9.42" sha1 = "0.10" sha2 = "0.10" +subtle = "2.6" thiserror = "1" time = { version = "0.3", features = ["formatting", "macros", "parsing"] } tokio = { version = "1", features = ["rt-multi-thread", "macros", "net", "sync", "time"] } diff --git a/README.md b/README.md index d872f828..caca2977 100644 --- a/README.md +++ b/README.md @@ -5,15 +5,15 @@ Clarion is a code-archaeology tool. It ingests a codebase, extracts entities `references`), persists the structural graph to a local SQLite store, and serves the result to consult-mode LLM agents over MCP. A coding agent that would otherwise re-explore the tree on every question reaches Clarion first and asks a -graph-aware tool. v1.0 ships a Rust core plus a Python language plugin; other -languages land in v2.0+. +graph-aware tool. The current release line ships a Rust core plus a Python +language plugin; other languages remain future scope. Part of the [Loom suite](docs/suite/loom.md) of code-archaeology, issue-tracking, and trust-topology tools. ## Status -**v1.0 — first publishable release.** Scope: +**v1.2.0 — current release line.** Scope: - **Python only.** Other-language plugins (`NG-15`) are v2.0+ scope. - **Structural extraction + on-demand LLM summarisation.** `clarion analyze` @@ -21,46 +21,48 @@ and trust-topology tools. dispatches the LLM lazily, one entity at a time. - **Local-first.** No mandatory cloud component; the only required network egress is the LLM provider during `summary` calls. -- **Filigree finding emission deferred to v0.2.** Clarion v1.0 surfaces issues - attached to entities via its own `issues_for` MCP tool (the WP9-A binding); - cross-product POSTing of Clarion-generated findings into Filigree's intake is - WP9-B, deferred per the - [Sprint 2 scope amendment](docs/implementation/sprint-2/scope-amendment-2026-05.md#4-v01-planmd-resequencing). +- **Stable identity and suite enrichment.** Clarion mints Stable Entity + Identity (SEI) tokens, serves the federation HTTP read API, emits opted-in + Filigree scan findings, and enriches MCP reads with Filigree/Wardline context + without making sibling products mandatory. +- **Guidance authoring.** Operators can author, import, export, and review + guidance sheets through `clarion guidance`; consult agents consume them + through MCP and summary cache invalidation. -**Known v1.0 limitations:** +**Known limitations:** - **HTTP file language inference uses stored plugin identity plus a narrow - core-extension fallback.** Plugin manifests already declare language and - extensions, but v1.0 does not persist a manifest language registry for the - `/api/v1/files` read path. This is tracked as post-v1.0 hardening. + core-extension fallback.** Plugin manifests declare language and extensions, + but Clarion does not yet persist a manifest language registry for the + `/api/v1/files` read path. +- **Some guidance lifecycle surfaces remain deferred.** The in-browser + staleness-review UI is still tracked separately; authored guidance is + available through the CLI and MCP read path today. ## What it does today -`clarion serve` exposes eight MCP tools that a consult-mode agent calls instead -of grep-and-read: +`clarion serve` exposes a 39-tool MCP surface that a consult-mode agent calls +instead of grep-and-read. The core tool families are: -| Tool | What it answers | +| Family | Examples | |---|---| -| `entity_at(file, line)` | "Which entity covers this source location?" | -| `find_entity(pattern)` | "Find entities whose name or summary text matches X." | -| `callers_of(id)` | "Who calls this function?" | -| `execution_paths_from(id, max_depth)` | "Show me up to N hops of call paths starting here." | -| `summary(id)` | "Give me a one-paragraph summary of this entity." (lazy LLM dispatch + cached) | -| `issues_for(id)` | "What Filigree issues are attached to this entity?" | -| `neighborhood(id)` | "Show callers, callees, container, contained entities, and references in one hop." | -| `subsystem_members(id)` | "Which entities belong to this subsystem?" (clustering output) | +| Navigation and graph traversal | `entity_at`, `entity_find`, `entity_callers_list`, `entity_execution_path_list`, `entity_neighborhood_get`, `subsystem_member_list`, `entity_call_site_list` | +| Briefing and source inspection | `entity_summary_get`, `entity_summary_preview_cost_get`, `entity_source_get`, `entity_orientation_pack_get`, `project_status_get` | +| Guidance, findings, and federation context | `entity_guidance_list`, `propose_guidance`, `promote_guidance`, `entity_finding_list`, `entity_wardline_get`, `entity_issue_list` | +| Analyze lifecycle and freshness | `analyze_start`, `analyze_status_get`, `analyze_cancel`, `index_diff_get` | +| Faceted and shortcut queries | `entity_tag_list`, `entity_kind_list`, `module_circular_import_list`, `entity_coupling_hotspot_list`, `entity_entry_point_list`, `entity_dead_list`, `entity_semantic_search_list` | ## Quick start ```bash -# 1. Install from the v1.0 GitHub Release -TAG=v1.0.0 +# 1. Install from the current GitHub Release +TAG=v1.2.0 curl -L -o clarion-x86_64-unknown-linux-gnu.tar.gz \ "https://github.com/tachyon-beep/clarion/releases/download/${TAG}/clarion-x86_64-unknown-linux-gnu.tar.gz" tar xzf clarion-x86_64-unknown-linux-gnu.tar.gz install clarion-x86_64-unknown-linux-gnu/clarion ~/.local/bin/ pipx install \ - "https://github.com/tachyon-beep/clarion/releases/download/${TAG}/clarion-plugin-python-1.0.0.tar.gz" + "https://github.com/tachyon-beep/clarion/releases/download/${TAG}/clarion-plugin-python-1.2.0.tar.gz" # 2. Initialise a project cd /path/to/your/python/repo @@ -87,9 +89,10 @@ in [docs/operator/getting-started.md](docs/operator/getting-started.md). crates/ Rust workspace ├── clarion-core/ Entity-ID assembler, plugin host, manifest parser ├── clarion-storage/ Writer-actor + reader-pool over SQLite (ADR-011) +├── clarion-federation/ Shared federation HTTP types ├── clarion-scanner/ Pre-ingest secret scanner (ADR-013, WP5) ├── clarion-cli/ The `clarion` binary (install, analyze, serve) -└── clarion-mcp/ MCP server exposing the eight consult tools +└── clarion-mcp/ MCP server exposing the consult tools plugins/python/ Python language plugin (pyright-backed) docs/clarion/1.0/ Design ladder — requirements → system-design → detailed-design docs/clarion/adr/ Authored architecture decision records @@ -103,9 +106,9 @@ federation doctrine that anchors every cross-product decision is in ## Storage and operations -Clarion v1.0 keeps all state in a project-local `.clarion/` directory. +Clarion keeps project state in a local `.clarion/` directory. The local-first storage model, the no-NFS constraint, the no-double-analyze -constraint (fs2 advisory lock), and the v1.0 backup/restore procedure are +constraint (fs2 advisory lock), and the backup/restore procedure are documented in [docs/clarion/1.0/operations.md](docs/clarion/1.0/operations.md). @@ -132,6 +135,8 @@ plugins/python/.venv/bin/pytest plugins/python # End-to-end bash tests/e2e/sprint_1_walking_skeleton.sh +bash tests/e2e/sprint_2_mcp_surface.sh +bash tests/e2e/phase3_subsystems.sh ``` Pre-commit hooks at [.pre-commit-config.yaml](.pre-commit-config.yaml) wire diff --git a/crates/clarion-cli/Cargo.toml b/crates/clarion-cli/Cargo.toml index 9b38ea08..67abedec 100644 --- a/crates/clarion-cli/Cargo.toml +++ b/crates/clarion-cli/Cargo.toml @@ -18,12 +18,14 @@ anyhow.workspace = true axum.workspace = true blake3.workspace = true clap.workspace = true -clarion-core = { path = "../clarion-core", version = "1.1.0" } -clarion-mcp = { path = "../clarion-mcp", version = "1.1.0" } -clarion-scanner = { path = "../clarion-scanner", version = "1.1.0" } -clarion-storage = { path = "../clarion-storage", version = "1.1.0" } +clarion-core = { path = "../clarion-core", version = "1.2.0" } +clarion-federation = { path = "../clarion-federation", version = "1.2.0" } +clarion-mcp = { path = "../clarion-mcp", version = "1.2.0" } +clarion-scanner = { path = "../clarion-scanner", version = "1.2.0" } +clarion-storage = { path = "../clarion-storage", version = "1.2.0" } dotenvy.workspace = true fs2.workspace = true +hmac.workspace = true ignore.workspace = true reqwest.workspace = true rusqlite.workspace = true @@ -31,6 +33,7 @@ serde.workspace = true serde_json.workspace = true serde_norway.workspace = true sha2.workspace = true +subtle.workspace = true time.workspace = true tokio.workspace = true tower.workspace = true @@ -42,7 +45,7 @@ xgraph.workspace = true [dev-dependencies] assert_cmd.workspace = true -clarion-plugin-fixture = { path = "../clarion-plugin-fixture", version = "1.1.0" } +clarion-plugin-fixture = { path = "../clarion-plugin-fixture", version = "1.2.0" } rusqlite.workspace = true serde_json.workspace = true sha1.workspace = true diff --git a/crates/clarion-cli/src/analyze.rs b/crates/clarion-cli/src/analyze.rs index b050822d..07990283 100644 --- a/crates/clarion-cli/src/analyze.rs +++ b/crates/clarion-cli/src/analyze.rs @@ -4,8 +4,8 @@ //! - Discover plugins via L9 `$PATH` convention (Task 5). //! - For each plugin: spawn, handshake, walk the source tree, call //! `analyze_file` for every matching file, persist via writer-actor. -//! - Pattern A buffering: collect entities in the blocking task, flush -//! `InsertEntity` commands from async context after the blocking task returns. +//! - File output streams through a bounded channel to the writer actor; import +//! edges are deferred until the plugin's module set is known. //! - On unrecoverable error (cap, escape, spawn, transport) → `FailRun`. //! - Zero successful plugins discovered → `SkippedNoPlugins` (existing path). @@ -23,22 +23,22 @@ use uuid::Uuid; use clarion_core::{ AcceptedEdge, AcceptedEntity, AnalyzeFileOutcome, CrashLoopBreaker, CrashLoopState, - DiscoveredPlugin, FINDING_DISABLED_CRASH_LOOP, HostError, HostFinding, UnresolvedCallSite, - discover, + DiscoveredPlugin, EmbeddingProvider, FINDING_DISABLED_CRASH_LOOP, HostError, HostFinding, + UnresolvedCallSite, discover, }; use clarion_storage::{ - DEFAULT_BATCH_SIZE, DEFAULT_CHANNEL_CAPACITY, GitRename, NewEntityDescriptor, PriorIndexEntry, - SeiBindingRecord, SeiDecision, SeiLineageEntry, UnresolvedCallSiteRecord, Writer, - alive_bindings_snapshot, - commands::{EdgeRecord, EntityRecord, FindingRecord, RunStatus, WriterCmd}, + DEFAULT_BATCH_SIZE, DEFAULT_CHANNEL_CAPACITY, EmbeddingKey, EmbeddingStore, GitRename, + NewEntityDescriptor, PriorIndexEntry, SeiBindingRecord, SeiDecision, SeiLineageEntry, + UnresolvedCallSiteRecord, Writer, alive_bindings_snapshot, + commands::{EdgeConfidence, EdgeRecord, EntityRecord, FindingRecord, RunStatus, WriterCmd}, mint_sei, module_dependency_edges, orphaned_bindings, prior_analyzed_commit, rebind_or_mint, sei::{BindingStatus, LineageEvent}, }; -use clarion_mcp::config::McpConfig; -use clarion_mcp::filigree::FiligreeHttpClient; -use clarion_mcp::filigree_url::resolve_filigree_url; -use clarion_mcp::scan_results::{ +use clarion_federation::config::{FiligreeConfig, McpConfig, SemanticSearchConfig}; +use clarion_federation::filigree::FiligreeHttpClient; +use clarion_federation::filigree_url::resolve_filigree_url; +use clarion_federation::scan_results::{ CLARION_SCAN_SOURCE, CleanStaleRequest, CleanStaleResponse, EmitOptions, PreparedBatch, ScanResultsResponse, clean_stale_url, prepare_batch, scan_results_url, }; @@ -58,6 +58,36 @@ const ENTITY_DELETED_RULE_ID: &str = "CLA-FACT-ENTITY-DELETED"; /// an entity that no longer exists. const GUIDANCE_ORPHAN_RULE_ID: &str = "CLA-FACT-GUIDANCE-ORPHAN"; +/// Bounded handoff from the blocking plugin worker to the async writer loop. +/// Mirrors detailed-design §11's `file_analyzed` backpressure cap. +const PLUGIN_FILE_BATCH_CHANNEL_CAPACITY: usize = 100; +const PROGRESS_HEARTBEAT_INTERVAL: std::time::Duration = std::time::Duration::from_secs(5); +const SEMANTIC_EMBEDDING_BATCH_SIZE: usize = 64; +type DescribedEdgeRecord = (String, EdgeRecord); + +/// REQ-GUIDANCE-05 (WS6 T4a): a guidance sheet whose `expires` instant is in the +/// past. The read path already excludes expired sheets from composition; this +/// finding surfaces the state operatively (the sheet is not deleted). +const GUIDANCE_EXPIRED_RULE_ID: &str = "CLA-FACT-GUIDANCE-EXPIRED"; + +/// REQ-GUIDANCE-05 (WS6 T4a): a guidance sheet whose matched entities carry a high +/// aggregate `git_churn_count` — the code under the sheet has churned enough that +/// the guidance is likely stale. Heuristic (confidence 0.7); inert until the +/// churn-history pipeline (clarion-997c93ec4e) populates `git_churn_count`. +const GUIDANCE_CHURN_STALE_RULE_ID: &str = "CLA-FACT-GUIDANCE-CHURN-STALE"; + +/// REQ-GUIDANCE-05 (WS6 T4): a Wardline-derived guidance sheet was preserved as +/// an operator override while `wardline.yaml` changed underneath it. +const GUIDANCE_STALE_RULE_ID: &str = "CLA-FACT-GUIDANCE-STALE"; + +/// Aggregate `git_churn_count` (summed over a sheet's matched entities) at or above +/// which a non-pinned sheet is flagged `CLA-FACT-GUIDANCE-CHURN-STALE`. +const CHURN_STALE_THRESHOLD: i64 = 50; + +/// The lower (stricter) churn threshold for `pinned: true` sheets — pinned guidance +/// is asserted institutional knowledge, so it goes stale on less churn. +const CHURN_STALE_THRESHOLD_PINNED: i64 = 20; + /// REQ-ANALYZE-05: a subsystem whose tier-bearing members declare ≥2 distinct /// Wardline tiers (a trust-boundary smell — the cluster straddles tiers). const TIER_MIXING_RULE_ID: &str = "CLA-FACT-TIER-SUBSYSTEM-MIXING"; @@ -79,6 +109,9 @@ const TIER_UNANIMOUS_RULE_ID: &str = "CLA-FACT-SUBSYSTEM-TIER-UNANIMOUS"; const POST_RUN_FINDING_RULES: &[&str] = &[ ENTITY_DELETED_RULE_ID, GUIDANCE_ORPHAN_RULE_ID, + GUIDANCE_EXPIRED_RULE_ID, + GUIDANCE_CHURN_STALE_RULE_ID, + GUIDANCE_STALE_RULE_ID, TIER_MIXING_RULE_ID, TIER_UNANIMOUS_RULE_ID, ]; @@ -101,7 +134,7 @@ const SYNTAX_ERROR_RULE_ID: &str = "CLA-PY-SYNTAX-ERROR"; /// last-write-wins via an atomic temp-file rename; a failed write is logged and /// dropped (progress is advisory, never run-fatal). struct ProgressReporter { - inner: Option, + inner: Option>, } struct ProgressInner { @@ -115,12 +148,14 @@ struct ProgressInner { impl ProgressReporter { fn new(progress_file: Option, run_id: String) -> Self { Self { - inner: progress_file.map(|path| ProgressInner { - path, - run_id, - pid: std::process::id(), - total_files: AtomicU64::new(0), - processed_files: AtomicU64::new(0), + inner: progress_file.map(|path| { + Arc::new(ProgressInner { + path, + run_id, + pid: std::process::id(), + total_files: AtomicU64::new(0), + processed_files: AtomicU64::new(0), + }) }), } } @@ -150,7 +185,7 @@ impl ProgressReporter { "total_files": inner.total_files.load(Ordering::Relaxed), "heartbeat_at": iso8601_now(), }); - self.write_atomic(&snapshot); + Self::write_atomic_inner(inner, &snapshot); } /// Snapshot at the start of a file (so `current_file` reflects in-flight @@ -159,6 +194,48 @@ impl ProgressReporter { self.phase("analyzing", Some(plugin_id), Some(file)); } + fn file_heartbeat_guard( + &self, + plugin_id: String, + file: String, + ) -> Option { + self.file_heartbeat_guard_with_interval(plugin_id, file, PROGRESS_HEARTBEAT_INTERVAL) + } + + fn file_heartbeat_guard_with_interval( + &self, + plugin_id: String, + file: String, + interval: std::time::Duration, + ) -> Option { + let inner = Arc::clone(self.inner.as_ref()?); + let (stop_tx, stop_rx) = std::sync::mpsc::channel(); + let handle = std::thread::spawn(move || { + loop { + match stop_rx.recv_timeout(interval) { + Ok(()) | Err(std::sync::mpsc::RecvTimeoutError::Disconnected) => break, + Err(std::sync::mpsc::RecvTimeoutError::Timeout) => { + let snapshot = serde_json::json!({ + "run_id": inner.run_id, + "pid": inner.pid, + "phase": "analyzing", + "current_plugin": plugin_id, + "current_file": file, + "processed_files": inner.processed_files.load(Ordering::Relaxed), + "total_files": inner.total_files.load(Ordering::Relaxed), + "heartbeat_at": iso8601_now(), + }); + ProgressReporter::write_atomic_inner(&inner, &snapshot); + } + } + } + }); + Some(ProgressHeartbeatGuard { + stop_tx: Some(stop_tx), + handle: Some(handle), + }) + } + /// Increment the processed-file counter after a file finishes. fn file_completed(&self) { if let Some(inner) = &self.inner { @@ -183,15 +260,12 @@ impl ProgressReporter { "total_files": inner.total_files.load(Ordering::Relaxed), "heartbeat_at": iso8601_now(), }); - self.write_atomic(&snapshot); + Self::write_atomic_inner(inner, &snapshot); inner.processed_files.fetch_add(1, Ordering::Relaxed); } } - fn write_atomic(&self, snapshot: &serde_json::Value) { - let Some(inner) = &self.inner else { - return; - }; + fn write_atomic_inner(inner: &ProgressInner, snapshot: &serde_json::Value) { let body = snapshot.to_string(); let tmp = inner.path.with_extension("json.tmp"); if let Err(err) = fs::write(&tmp, &body).and_then(|()| fs::rename(&tmp, &inner.path)) { @@ -204,6 +278,22 @@ impl ProgressReporter { } } +struct ProgressHeartbeatGuard { + stop_tx: Option>, + handle: Option>, +} + +impl Drop for ProgressHeartbeatGuard { + fn drop(&mut self) { + if let Some(stop_tx) = self.stop_tx.take() { + let _ = stop_tx.send(()); + } + if let Some(handle) = self.handle.take() { + let _ = handle.join(); + } + } +} + // ── Public entry point ──────────────────────────────────────────────────────── #[derive(Debug, Clone, Default)] @@ -284,6 +374,15 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti clarion_storage::schema::apply_migrations(&mut conn) .map_err(|e| anyhow::anyhow!("{e}")) .context("apply pending migrations")?; + let repaired = clarion_storage::mark_stale_running_runs_failed(&conn) + .map_err(|e| anyhow::anyhow!("{e}")) + .context("mark stale running analyze runs failed")?; + if repaired > 0 { + tracing::warn!( + repaired, + "marked stale running analyze runs failed before starting new analyze" + ); + } } let analyze_config = AnalyzeConfig::load(&project_root, options.config_path.as_deref())?; @@ -455,7 +554,20 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti } // ── Walk the source tree (once, union of all extensions) ───────────────── - let source_files = collect_source_files(&project_root, &wanted_extensions); + let source_walk = collect_source_files(&project_root, &wanted_extensions); + let source_walk_skipped_entries = + u64::try_from(source_walk.skipped_errors.len()).unwrap_or(u64::MAX); + let source_walk_error_samples = source_walk + .skipped_errors + .iter() + .take(SOURCE_WALK_ERROR_SAMPLE_LIMIT) + .cloned() + .collect::>(); + let source_walk_errors_omitted = source_walk + .skipped_errors + .len() + .saturating_sub(source_walk_error_samples.len()); + let source_files = source_walk.files; tracing::info!(file_count = source_files.len(), "source tree walk complete"); progress.set_total(source_files.len() as u64); progress.phase("analyzing", None, None); @@ -574,6 +686,17 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti // synthetic project entity minted just before persistence. let mut failure_findings: Vec = Vec::new(); let project_anchor = project_anchor_id(&project_root); + if source_walk_skipped_entries > 0 { + failure_findings.push(source_walk_finding_record( + &project_root, + source_walk_skipped_entries, + &source_walk_error_samples, + source_walk_errors_omitted, + &project_anchor, + &run_id, + &started_at, + )); + } let file_timeout = plugin_file_timeout(); let briefing_blocks = secret_scan_outcome.briefing_blocks_shared(); let scanned_files = secret_scan_outcome.scanned_files_shared(); @@ -612,7 +735,7 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti let (plugin_files, skipped_files): (Vec, Vec) = plugin_files.into_iter().partition(|path| { secret_finding_files.contains(&crate::secret_scan::canonical_or_original(path)) - || file_needs_reanalysis(path, &prior_file_hashes) + || file_needs_reanalysis(&project_root, path, &prior_file_hashes) }); for path in &skipped_files { skipped_files_total += 1; @@ -644,8 +767,9 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti "processing plugin" ); - // Run the blocking plugin work on the tokio threadpool. - // Pattern A: collect all entities into memory, return to async side. + // Run the blocking plugin work on the tokio threadpool. Completed file + // output flows through a bounded channel so writer backpressure applies + // during extraction rather than after the whole plugin has returned. let manifest = plugin.manifest.clone(); let project_root_clone = project_root.clone(); let pid_clone = plugin_id.clone(); @@ -655,6 +779,125 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti let scanned_files_clone = Arc::clone(&scanned_files); let progress_clone = Arc::clone(&progress); + let (batch_tx, mut batch_rx) = + tokio::sync::mpsc::channel(PLUGIN_FILE_BATCH_CHANNEL_CAPACITY); + let join_handle = tokio::task::spawn_blocking(move || { + run_plugin_blocking( + manifest, + &project_root_clone, + &pid_clone, + &exec_clone, + &files_clone, + &briefing_blocks_clone, + &scanned_files_clone, + &progress_clone, + file_timeout, + &batch_tx, + ) + }); + + let mut insert_err: Option = None; + let mut plugin_entity_count: u64 = 0; + let mut plugin_edge_count: u64 = 0; + let mut seen_plugin_entity_ids: BTreeSet = BTreeSet::new(); + let mut pending_plugin_edges: Vec = Vec::new(); + while let Some(message) = batch_rx.recv().await { + if insert_err.is_some() { + continue; + } + + match message { + PluginBatchMessage::File(mut batch) => { + unresolved_call_sites_total += batch.stats.unresolved_call_sites_total; + reference_sites_total += batch.stats.reference_sites_total; + references_resolved_total += batch.stats.references_resolved_total; + references_skipped_external_total += + batch.stats.references_skipped_external_total; + references_skipped_cap_total += batch.stats.references_skipped_cap_total; + imports_skipped_external_total += batch.stats.imports_skipped_external_total; + unresolved_reference_sites_total += + batch.stats.unresolved_reference_sites_total; + pyright_latency.record_many(batch.stats.pyright_query_latency_ms.clone()); + pyright_index_parse_latency + .record_many(batch.stats.pyright_index_parse_latency_ms.clone()); + extractor_parse_latency + .record_many(batch.stats.extractor_parse_latency_ms.clone()); + + secret_scan_outcome.remember_finding_anchors(&batch.entities); + let batch_entity_ids: Vec = + batch.entities.iter().map(|(id, _)| id.clone()).collect(); + let batch_edges = std::mem::take(&mut batch.edges); + match persist_plugin_file_batch( + &writer, + batch, + &run_id, + &started_at, + head_commit.as_deref(), + ) + .await + { + Ok(effects) => { + plugin_entity_count += effects.entity_count; + seen_plugin_entity_ids.extend(batch_entity_ids); + pending_plugin_edges.extend(batch_edges); + let ready_edges = drain_ready_plugin_edges( + &mut pending_plugin_edges, + &seen_plugin_entity_ids, + ); + match persist_plugin_edges(&writer, ready_edges).await { + Ok(edge_count) => { + plugin_edge_count += edge_count; + } + Err(e) => { + insert_err = Some(e); + } + } + prior_index_entries.extend(effects.prior_index_entries); + sei_descriptors.extend(effects.sei_descriptors); + failure_findings.extend(effects.failure_findings); + } + Err(e) => { + insert_err = Some(e); + } + } + } + PluginBatchMessage::DeferredImportEdges { + edges, + imports_skipped_external, + } => { + imports_skipped_external_total += imports_skipped_external; + pending_plugin_edges.extend(edges); + let ready_edges = drain_ready_plugin_edges( + &mut pending_plugin_edges, + &seen_plugin_entity_ids, + ); + match persist_plugin_edges(&writer, ready_edges).await { + Ok(edge_count) => { + plugin_edge_count += edge_count; + if !pending_plugin_edges.is_empty() { + match persist_plugin_edges( + &writer, + std::mem::take(&mut pending_plugin_edges), + ) + .await + { + Ok(edge_count) => { + plugin_edge_count += edge_count; + } + Err(e) => { + insert_err = Some(e); + } + } + } + } + Err(e) => { + insert_err = Some(e); + } + } + } + } + } + // A JoinError here means the blocking task panicked (OOM, stack // overflow, internal unwrap, abort — anything that unwinds past the // top of `run_plugin_blocking`). Earlier revisions `?`-propagated @@ -663,23 +906,20 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti // permanently. Treat the panic as a crash reason: it flows into the // existing crash-recording path below, ticks the crash-loop breaker, // and resolves the run via SoftFailed → CommitRun(Failed) with exit 1. - let spawn_result: Result = handle_plugin_task_join_result( - tokio::task::spawn_blocking(move || { - run_plugin_blocking( - manifest, - &project_root_clone, - &pid_clone, - &exec_clone, - &files_clone, - &briefing_blocks_clone, - &scanned_files_clone, - &progress_clone, - file_timeout, - ) - }) - .await, - &plugin_id, - ); + let spawn_result: Result = + handle_plugin_task_join_result(join_handle.await, &plugin_id); + + if let Some(e) = insert_err { + tracing::error!( + plugin_id = %plugin_id, + error = %e, + "writer-actor rejected streamed insert; failing run" + ); + run_outcome = RunOutcome::HardFailed { + reason: format!("{e:#}"), + }; + break 'plugins; + } match spawn_result { Err(plugin_error) => { @@ -728,25 +968,7 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti // Fall through to the next iteration — nothing else to do // for a crashed plugin, and there's no code after the match. } - Ok(BatchResult { - entities, - edges, - unresolved_call_sites, - stats, - findings, - signatures, - }) => { - unresolved_call_sites_total += stats.unresolved_call_sites_total; - reference_sites_total += stats.reference_sites_total; - references_resolved_total += stats.references_resolved_total; - references_skipped_external_total += stats.references_skipped_external_total; - references_skipped_cap_total += stats.references_skipped_cap_total; - imports_skipped_external_total += stats.imports_skipped_external_total; - unresolved_reference_sites_total += stats.unresolved_reference_sites_total; - pyright_latency.record_many(stats.pyright_query_latency_ms); - pyright_index_parse_latency.record_many(stats.pyright_index_parse_latency_ms); - extractor_parse_latency.record_many(stats.extractor_parse_latency_ms); - + Ok(BatchResult { findings }) => { // Log findings individually (operator-facing stderr) and persist // them (REQ-ANALYZE-06) so an ontology check, malformed-JSON drop, // or path-jail violation is visible in the store, not just logs. @@ -761,122 +983,12 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti )); } - // Persist entities + edges via writer-actor (async side). - // - // A writer-actor error here (per-kind contract violation, - // unique-key constraint, disk full) must NOT short-circuit - // `run()` via `?` — that would bypass the CommitRun/FailRun - // block below and leave `runs.status = 'running'` permanently. - // Convert to a terminal `RunOutcome::HardFailed` so FailRun - // marks the run. Entities are inserted before edges so the - // edge FK references resolve at insert time (B.3 §5). - let entity_count = entities.len() as u64; - let edge_count = edges.len() as u64; - secret_scan_outcome.remember_finding_anchors(&entities); - let mut insert_err: Option = None; - for (id_str, record) in entities { - // Capture the prior-index row and the SEI descriptor BEFORE - // `record` is moved into the command. `signature` (WS1) is the - // plugin-declared matcher input, now carried into both the - // prior-index snapshot and the SEI descriptor list. - let signature = signatures.get(&id_str).cloned(); - let prior_entry = - record - .content_hash - .clone() - .map(|body_hash| PriorIndexEntry { - locator: record.id.clone(), - body_hash, - signature: signature.clone(), - }); - // Every accepted entity gets a descriptor (even ones with no - // body hash — they still carry/mint an SEI on the - // locator-unchanged path; only the move case needs a body). - let descriptor = NewEntityDescriptor { - locator: record.id.clone(), - body_hash: record.content_hash.clone(), - signature, - }; - // REQ-ANALYZE-06: capture a parse-failure finding from the - // degraded entity BEFORE `record` is moved into the command. - // Anchors to this same entity (inserted just below), so the - // finding's FK resolves. - if let Some(finding) = syntax_error_finding(&record, &run_id, &started_at) { - failure_findings.push(finding); - } - let res = writer - .send_wait(|ack| WriterCmd::InsertEntity { - entity: Box::new(record), - ack, - }) - .await - .map_err(|e| anyhow::anyhow!("{e}")) - .with_context(|| format!("InsertEntity for {id_str}")); - if let Err(e) = res { - insert_err = Some(e); - break; - } - // Recorded only after a successful insert so neither the - // snapshot nor the SEI pass claims an entity the durable - // graph lacks. - if let Some(prior_entry) = prior_entry { - prior_index_entries.push(prior_entry); - } - sei_descriptors.push(descriptor); - } - if insert_err.is_none() { - for pending in unresolved_call_sites { - let caller_id = pending.caller_entity_id.clone(); - let res = writer - .send_wait(|ack| WriterCmd::ReplaceUnresolvedCallSitesForCaller { - caller_entity_id: pending.caller_entity_id, - caller_content_hash: pending.caller_content_hash, - sites: pending.sites, - ack, - }) - .await - .map_err(|e| anyhow::anyhow!("{e}")) - .with_context(|| { - format!("ReplaceUnresolvedCallSitesForCaller for {caller_id}") - }); - if let Err(e) = res { - insert_err = Some(e); - break; - } - } - } - if insert_err.is_none() { - for (descr, record) in edges { - let res = writer - .send_wait(|ack| WriterCmd::InsertEdge { - edge: Box::new(record), - ack, - }) - .await - .map_err(|e| anyhow::anyhow!("{e}")) - .with_context(|| format!("InsertEdge {descr}")); - if let Err(e) = res { - insert_err = Some(e); - break; - } - } - } - if let Some(e) = insert_err { - tracing::error!( - plugin_id = %plugin_id, - error = %e, - "writer-actor rejected insert; failing run" - ); - run_outcome = RunOutcome::HardFailed { - reason: format!("{e:#}"), - }; - break 'plugins; - } - total_entity_count += entity_count; - total_edge_count += edge_count; + total_entity_count += plugin_entity_count; + total_edge_count += plugin_edge_count; tracing::info!( plugin_id = %plugin_id, - entity_count, edge_count, + entity_count = plugin_entity_count, + edge_count = plugin_edge_count, "plugin complete", ); } @@ -885,7 +997,13 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti if !matches!(run_outcome, RunOutcome::HardFailed { .. }) && let Err(e) = secret_scan_outcome - .persist_findings(&writer, &run_id, &project_root, &started_at) + .persist_findings( + &writer, + &run_id, + &project_root, + &started_at, + head_commit.as_deref(), + ) .await { tracing::error!(run_id = %run_id, error = %e, "secret finding persistence failed"); @@ -906,7 +1024,9 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti .iter() .any(|f| f.entity_id == project_anchor); if needs_project_anchor - && let Err(e) = ensure_project_anchor(&writer, &project_root, &started_at).await + && let Err(e) = + ensure_project_anchor(&writer, &project_root, &started_at, head_commit.as_deref()) + .await { tracing::error!(run_id = %run_id, error = %e, "project finding-anchor insert failed"); run_outcome = RunOutcome::HardFailed { @@ -967,7 +1087,15 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti let phase3_output = if matches!(run_outcome, RunOutcome::HardFailed { .. }) { Phase3Output::not_run() } else { - match run_phase3_clustering(&writer, &db_path, &run_id, &analyze_config).await { + match run_phase3_clustering( + &writer, + &db_path, + &run_id, + &analyze_config, + head_commit.as_deref(), + ) + .await + { Ok(output) => { total_entity_count += output.subsystems_inserted; total_edge_count += output.in_subsystem_edges_inserted; @@ -1067,6 +1195,9 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti "references_skipped_external_total": references_skipped_external_total, "references_skipped_cap_total": references_skipped_cap_total, "imports_skipped_external_total": imports_skipped_external_total, + "source_walk_skipped_entries": source_walk_skipped_entries, + "source_walk_error_samples": source_walk_error_samples, + "source_walk_errors_omitted": source_walk_errors_omitted, "skipped_files": skipped_files_total, "unresolved_reference_sites_total": unresolved_reference_sites_total, "pyright_query_latency_p95_ms": pyright_query_latency_p95_ms, @@ -1180,6 +1311,87 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti "tier-subsystem findings skipped (run already committed successfully)" ), } + // REQ-GUIDANCE-04: when `wardline.yaml` is present, keep the + // generated guidance sheets in sync before evaluating guidance + // staleness. Operator edits are preserved as + // `wardline_derived_overridden`, so the following staleness pass can + // surface manifest drift instead of overwriting human review. + match crate::wardline_guidance::sync_wardline_guidance(&db_path, &project_root) { + Ok(stats) if stats.generated > 0 || stats.overridden > 0 => tracing::info!( + run_id = %run_id, + wardline_guidance_generated = stats.generated, + wardline_guidance_overridden = stats.overridden, + "Wardline-derived guidance synced" + ), + Ok(_) => {} + Err(e) => tracing::warn!( + run_id = %run_id, + error = %e, + "Wardline-derived guidance skipped (run already committed successfully)" + ), + } + let mcp_config = load_mcp_config(&project_root, options.config_path.as_deref()); + match crate::serve::build_embedding_provider(&mcp_config.semantic_search, |name| { + std::env::var(name).ok() + }) { + Ok(Some(provider)) => match populate_semantic_embeddings( + &project_root, + &db_path, + &mcp_config.semantic_search, + provider, + ) + .await + { + Ok(stats) if stats.embedded > 0 || stats.skipped_fresh > 0 => tracing::info!( + run_id = %run_id, + model_id = %stats.model_id, + considered = stats.considered, + skipped_fresh = stats.skipped_fresh, + embedded = stats.embedded, + tokens_input = stats.tokens_input, + "semantic embedding population complete" + ), + Ok(_) => {} + Err(e) => tracing::warn!( + run_id = %run_id, + error = %e, + "semantic embedding population skipped (run already committed successfully)" + ), + }, + Ok(None) => {} + Err(e) => tracing::warn!( + run_id = %run_id, + error = %e, + "semantic embedding provider unavailable (run already committed successfully)" + ), + } + // REQ-GUIDANCE-05 (WS6 T4a): guidance-staleness findings (EXPIRED + + // CHURN-STALE). Runs on EVERY analyze, deliberately OUTSIDE the SEI + // `if no_sei { … } else { … }` block above and independent of any + // deletion: these surface a sheet's own state, not an identity event, + // so `--no-sei` must NOT suppress them. Best-effort + enrich-only like + // the tier pass: a failure logs and never un-commits the graph. + match emit_guidance_staleness_findings( + &writer, + &db_path, + &project_root, + &run_id, + &iso8601_now(), + ) + .await + { + Ok(emitted) if emitted > 0 => tracing::info!( + run_id = %run_id, + guidance_staleness_findings = emitted, + "guidance-staleness findings emitted" + ), + Ok(_) => {} + Err(e) => tracing::warn!( + run_id = %run_id, + error = %e, + "guidance-staleness findings skipped (run already committed successfully)" + ), + } // Phase 8c (clarion-ef8f64d5fd): the deletion + tier findings above // are persisted via `PersistPostRunFinding` *after* the Phase-8 // emission already ran, so without this they reach the store but @@ -1238,6 +1450,9 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti "references_skipped_external_total": references_skipped_external_total, "references_skipped_cap_total": references_skipped_cap_total, "imports_skipped_external_total": imports_skipped_external_total, + "source_walk_skipped_entries": source_walk_skipped_entries, + "source_walk_error_samples": source_walk_error_samples, + "source_walk_errors_omitted": source_walk_errors_omitted, "skipped_files": skipped_files_total, "unresolved_reference_sites_total": unresolved_reference_sites_total, "pyright_query_latency_p95_ms": pyright_query_latency_p95_ms, @@ -1570,9 +1785,11 @@ async fn run_sei_mint_pass( /// /// For each deleted entity: emit one `CLA-FACT-ENTITY-DELETED` (anchored to the /// entity's own row — `entities` is never pruned, so the FK resolves) and -/// invalidate its cached summaries. Then, for every guidance sheet whose explicit -/// `guides` edge targets a deleted entity, emit one `CLA-FACT-GUIDANCE-ORPHAN` -/// (anchored to the guidance sheet, the deleted target carried as a related id). +/// invalidate its cached summaries. Then, for every guidance sheet stranded on a +/// deleted entity — via an explicit `guides` edge OR a `match_rules` +/// `{"type":"entity","id":X}` entry (detailed-design.md §5) — emit one +/// `CLA-FACT-GUIDANCE-ORPHAN` (anchored to the sheet, deleted target as a related +/// id). A sheet that strands the same target via both paths emits one finding. /// /// Returns `Ok(0)` for an empty deleted set without opening a connection. async fn emit_deletion_findings( @@ -1611,17 +1828,19 @@ async fn emit_deletion_findings( .with_context(|| format!("InvalidateSummaryCacheForEntity {entity_id}"))?; } - // Guidance sheets that explicitly `guides` a now-deleted entity are orphaned. - // Read the (sheet, target) pairs once, filter to deleted targets, sort for - // determinism. The `guides` edge survives the target's vanishing because - // `entities` is never pruned (the ON DELETE CASCADE never fires). - let orphaned_guidance = { + // Guidance sheets stranded on a now-deleted entity are orphaned via EITHER an + // explicit `guides` edge OR a `match_rules` `{"type":"entity","id":X}` entry + // pointing at a deleted target (detailed-design.md §5). Collect both into one + // de-duped, sorted `(sheet, target)` set so a sheet that orphans the same + // target via both paths emits exactly ONE finding. Both survive the target's + // vanishing because `entities` is never pruned. + let orphaned_guidance: std::collections::BTreeSet<(String, String)> = { let conn = Connection::open(db_path).context("open read connection for guidance-orphan scan")?; - let mut stmt = conn + + let mut pairs: std::collections::BTreeSet<(String, String)> = conn .prepare("SELECT from_id, to_id FROM edges WHERE kind = 'guides'") - .context("prepare guides-edge scan")?; - let mut pairs: Vec<(String, String)> = stmt + .context("prepare guides-edge scan")? .query_map([], |row| { Ok((row.get::<_, String>(0)?, row.get::<_, String>(1)?)) }) @@ -1631,7 +1850,31 @@ async fn emit_deletion_findings( .into_iter() .filter(|(_, to_id)| deleted_set.contains(to_id.as_str())) .collect(); - pairs.sort(); + + // Scan every guidance sheet's `match_rules` for `{type:entity, id:X}` + // entries whose X is in the deleted set. Reuse the shared rule shape + // (`clarion_storage::rule_match` reads `{"type":"entity","id":…}`), not a + // hand-rolled key. + for sheet in clarion_storage::list_guidance_sheets(&conn) + .map_err(|e| anyhow::anyhow!("{e}")) + .context("list guidance sheets for match-rule orphan scan")? + { + let Some(rules) = sheet + .properties + .get("match_rules") + .and_then(serde_json::Value::as_array) + else { + continue; + }; + for rule in rules { + if rule.get("type").and_then(serde_json::Value::as_str) == Some("entity") + && let Some(target) = rule.get("id").and_then(serde_json::Value::as_str) + && deleted_set.contains(target) + { + pairs.insert((sheet.id.clone(), target.to_owned())); + } + } + } pairs }; @@ -1695,7 +1938,7 @@ fn guidance_orphan_finding( kind: "fact".to_owned(), severity: "WARN".to_owned(), confidence: Some(1.0), - confidence_basis: Some("guidance `guides`-edge target deleted".to_owned()), + confidence_basis: Some("guidance sheet target deleted".to_owned()), entity_id: guidance_id.to_owned(), related_entities_json: serde_json::json!([deleted_entity_id]).to_string(), message: format!( @@ -1714,6 +1957,294 @@ fn guidance_orphan_finding( } } +/// REQ-GUIDANCE-05 (WS6 T4a): persist guidance-staleness findings over the +/// committed graph and return the count. Independent signals per sheet: +/// +/// - **`CLA-FACT-GUIDANCE-EXPIRED`** — the sheet's `expires` instant is lexically +/// `< now` (both are the fixed-width `YYYY-MM-DDTHH:MM:SS.sssZ` form +/// [`iso8601_now`] emits, so a byte compare is a valid instant compare). Absent +/// or malformed `expires` ⇒ skip. +/// - **`CLA-FACT-GUIDANCE-CHURN-STALE`** — the aggregate `git_churn_count` over the +/// sheet's matched entities meets the staleness threshold (asymmetric: 20 for +/// `pinned` sheets, 50 otherwise). +/// - **`CLA-FACT-GUIDANCE-STALE`** — a Wardline-derived override still carries +/// the old `wardline.yaml` manifest hash after the manifest changed. +/// +/// Runs post-`CommitRun`, unconditionally (NOT gated on the SEI pass or on +/// deletions) — see the call site. Deterministic: sheets in +/// [`clarion_storage::list_guidance_sheets`] order; matched ids sorted. +/// +/// Churn proxy note: the design wants "churn since `authored_at`/`reviewed_at`", +/// but there is no churn-history to compute a true delta and `git_churn_count` is +/// not populated by analyze in v1.0 (so this is honest-empty in production). We +/// implement the computable proxy — the aggregate current `git_churn_count` over +/// matched entities vs the threshold. A true since-authored delta awaits the +/// churn-history pipeline (clarion-997c93ec4e); `authored_at`/`reviewed_at` are +/// deliberately unused here because no real delta is computable. +enum PendingGuidanceStaleness { + Expired(String), + WardlineStale { + sheet_id: String, + stored_manifest_hash: String, + current_manifest_hash: String, + }, + ChurnStale { + sheet_id: String, + agg: i64, + matched: Vec, + }, +} + +fn plan_guidance_staleness_findings( + db_path: &Path, + project_root: &Path, + now: &str, +) -> anyhow::Result> { + let current_wardline_hash = crate::wardline_guidance::current_manifest_hash(project_root)?; + let conn = Connection::open(db_path) + .context("open read connection for guidance-staleness findings")?; + let canonical_root = project_root + .canonicalize() + .unwrap_or_else(|_| project_root.to_path_buf()); + + let sheets = clarion_storage::list_guidance_sheets(&conn) + .map_err(|e| anyhow::anyhow!("{e}")) + .context("list guidance sheets for staleness scan")?; + + // Entities carrying a populated churn count (the only ones that can move an + // aggregate). Empty in production today (see fn doc). + let churned: Vec<(String, i64)> = conn + .prepare( + "SELECT id, git_churn_count FROM entities \ + WHERE git_churn_count IS NOT NULL ORDER BY id", + ) + .context("prepare churned-entity scan")? + .query_map([], |row| { + Ok((row.get::<_, String>(0)?, row.get::<_, i64>(1)?)) + }) + .context("query churned entities")? + .collect::>>() + .context("collect churned entities")?; + + let mut plan = Vec::new(); + for sheet in &sheets { + // EXPIRED: lexical (instant) compare against `now`. + if let Some(expires) = sheet + .properties + .get("expires") + .and_then(serde_json::Value::as_str) + && expires < now + { + plan.push(PendingGuidanceStaleness::Expired(sheet.id.clone())); + } + + if let Some(current_hash) = current_wardline_hash.as_deref() + && crate::wardline_guidance::is_wardline_derived(&sheet.properties) + && let Some(stored_hash) = sheet + .properties + .get("wardline_manifest_hash") + .and_then(serde_json::Value::as_str) + && stored_hash != current_hash + { + plan.push(PendingGuidanceStaleness::WardlineStale { + sheet_id: sheet.id.clone(), + stored_manifest_hash: stored_hash.to_owned(), + current_manifest_hash: current_hash.to_owned(), + }); + } + + // CHURN-STALE: aggregate churn over matched entities vs asymmetric + // threshold. Reuse the shared matcher; only churned entities can matter. + let pinned = sheet + .properties + .get("pinned") + .and_then(serde_json::Value::as_bool) + .unwrap_or(false); + let threshold = if pinned { + CHURN_STALE_THRESHOLD_PINNED + } else { + CHURN_STALE_THRESHOLD + }; + + let mut agg: i64 = 0; + let mut matched: Vec = Vec::new(); + for (entity_id, churn) in &churned { + if clarion_storage::guidance_sheet_matches_entity( + &conn, + sheet, + entity_id, + &canonical_root, + ) + .map_err(|e| anyhow::anyhow!("{e}")) + .with_context(|| format!("match {entity_id} against {}", sheet.id))? + { + agg = agg.saturating_add(*churn); + matched.push(entity_id.clone()); + } + } + if agg >= threshold { + matched.sort(); + plan.push(PendingGuidanceStaleness::ChurnStale { + sheet_id: sheet.id.clone(), + agg, + matched, + }); + } + } + Ok(plan) +} + +async fn emit_guidance_staleness_findings( + writer: &Writer, + db_path: &Path, + project_root: &Path, + run_id: &str, + now: &str, +) -> anyhow::Result { + // Build the (sheet, [matched churn pairs]) plan in one read pass, then emit. + // Drive the churn scan off the populated churn set only — `WHERE + // git_churn_count IS NOT NULL` — so the work is O(sheets × churned), and so + // production (no churn populated) yields an empty candidate set and CHURN-STALE + // never fires, with no special-casing. + let plan = plan_guidance_staleness_findings(db_path, project_root, now)?; + let mut count: u64 = 0; + for pending in &plan { + let finding = match pending { + PendingGuidanceStaleness::Expired(sheet_id) => { + guidance_expired_finding(sheet_id, run_id, now) + } + PendingGuidanceStaleness::WardlineStale { + sheet_id, + stored_manifest_hash, + current_manifest_hash, + } => guidance_stale_finding( + sheet_id, + stored_manifest_hash, + current_manifest_hash, + run_id, + now, + ), + PendingGuidanceStaleness::ChurnStale { + sheet_id, + agg, + matched, + } => guidance_churn_stale_finding(sheet_id, *agg, matched, run_id, now), + }; + let finding_id = finding.id.clone(); + writer + .send_wait(|ack| WriterCmd::PersistPostRunFinding { + finding: Box::new(finding), + ack, + }) + .await + .map_err(|e| anyhow::anyhow!("{e}")) + .with_context(|| format!("PersistPostRunFinding {finding_id}"))?; + count += 1; + } + Ok(count) +} + +/// Build a `CLA-FACT-GUIDANCE-EXPIRED` finding anchored to the expired sheet. +/// Run-scoped, deterministic id; INFO, confidence 1.0. +fn guidance_expired_finding(guidance_id: &str, run_id: &str, now: &str) -> FindingRecord { + FindingRecord { + id: format!("core:finding:{run_id}:guidance-expired:{guidance_id}"), + tool: "clarion".to_owned(), + tool_version: env!("CARGO_PKG_VERSION").to_owned(), + run_id: run_id.to_owned(), + rule_id: GUIDANCE_EXPIRED_RULE_ID.to_owned(), + kind: "fact".to_owned(), + severity: "INFO".to_owned(), + confidence: Some(1.0), + confidence_basis: Some("guidance sheet past its `expires`".to_owned()), + entity_id: guidance_id.to_owned(), + related_entities_json: "[]".to_owned(), + message: format!("Guidance sheet {guidance_id} is past its `expires` instant"), + evidence_json: serde_json::json!({ "guidance_id": guidance_id }).to_string(), + properties_json: "{}".to_owned(), + supports_json: "[]".to_owned(), + supported_by_json: "[]".to_owned(), + created_at: now.to_owned(), + updated_at: now.to_owned(), + } +} + +fn guidance_stale_finding( + guidance_id: &str, + stored_manifest_hash: &str, + current_manifest_hash: &str, + run_id: &str, + now: &str, +) -> FindingRecord { + FindingRecord { + id: format!("core:finding:{run_id}:guidance-stale:{guidance_id}"), + tool: "clarion".to_owned(), + tool_version: env!("CARGO_PKG_VERSION").to_owned(), + run_id: run_id.to_owned(), + rule_id: GUIDANCE_STALE_RULE_ID.to_owned(), + kind: "fact".to_owned(), + severity: "WARN".to_owned(), + confidence: Some(1.0), + confidence_basis: Some("Wardline manifest hash drift".to_owned()), + entity_id: guidance_id.to_owned(), + related_entities_json: "[]".to_owned(), + message: format!( + "Wardline-derived guidance sheet {guidance_id} is stale relative to wardline.yaml" + ), + evidence_json: serde_json::json!({ + "guidance_id": guidance_id, + "stored_manifest_hash": stored_manifest_hash, + "current_manifest_hash": current_manifest_hash, + }) + .to_string(), + properties_json: "{}".to_owned(), + supports_json: "[]".to_owned(), + supported_by_json: "[]".to_owned(), + created_at: now.to_owned(), + updated_at: now.to_owned(), + } +} + +/// Build a `CLA-FACT-GUIDANCE-CHURN-STALE` finding anchored to the sheet, carrying +/// the matched entities (sorted) as related ids and the aggregate churn + +/// threshold as evidence. Run-scoped, deterministic id; WARN, confidence 0.7 +/// (heuristic). +fn guidance_churn_stale_finding( + guidance_id: &str, + aggregate_churn: i64, + matched: &[String], + run_id: &str, + now: &str, +) -> FindingRecord { + FindingRecord { + id: format!("core:finding:{run_id}:guidance-churn-stale:{guidance_id}"), + tool: "clarion".to_owned(), + tool_version: env!("CARGO_PKG_VERSION").to_owned(), + run_id: run_id.to_owned(), + rule_id: GUIDANCE_CHURN_STALE_RULE_ID.to_owned(), + kind: "fact".to_owned(), + severity: "WARN".to_owned(), + confidence: Some(0.7), + confidence_basis: Some("heuristic".to_owned()), + entity_id: guidance_id.to_owned(), + related_entities_json: serde_json::to_string(matched).unwrap_or_else(|_| "[]".to_owned()), + message: format!( + "Guidance sheet {guidance_id} covers high-churn code (aggregate git_churn_count = {aggregate_churn})" + ), + evidence_json: serde_json::json!({ + "guidance_id": guidance_id, + "aggregate_git_churn_count": aggregate_churn, + "matched_entities": matched, + }) + .to_string(), + properties_json: "{}".to_owned(), + supports_json: "[]".to_owned(), + supported_by_json: "[]".to_owned(), + created_at: now.to_owned(), + updated_at: now.to_owned(), + } +} + /// Extract a subsystem-member's Wardline tier from its opaque `wardline_json` /// blob: the best-effort top-level `tier` field, stringified. Kept byte-identical /// to the MCP `find_by_wardline` read path (`facet_matches`) so the analyze-side @@ -1929,6 +2460,7 @@ async fn run_phase3_clustering( db_path: &Path, run_id: &str, analyze_config: &AnalyzeConfig, + head_commit: Option<&str>, ) -> Result { let started = std::time::Instant::now(); let config = &analyze_config.analysis.clustering; @@ -2061,30 +2593,33 @@ async fn run_phase3_clustering( "weight_by": config.weight_by.as_str(), }) .to_string(); + let mut entity = EntityRecord { + id: subsystem_id.clone(), + plugin_id: "core".to_owned(), + kind: "subsystem".to_owned(), + name: subsystem_name, + short_name: subsystem_short_name, + parent_id: None, + source_file_id: None, + source_file_path: None, + source_byte_start: None, + source_byte_end: None, + source_line_start: None, + source_line_end: None, + properties_json, + tags: Vec::new(), + content_hash: None, + summary_json: None, + wardline_json: None, + first_seen_commit: None, + last_seen_commit: None, + created_at: now.clone(), + updated_at: now, + }; + stamp_entity_git_provenance(&mut entity, head_commit); writer .send_wait(|ack| WriterCmd::InsertEntity { - entity: Box::new(EntityRecord { - id: subsystem_id.clone(), - plugin_id: "core".to_owned(), - kind: "subsystem".to_owned(), - name: subsystem_name, - short_name: subsystem_short_name, - parent_id: None, - source_file_id: None, - source_file_path: None, - source_byte_start: None, - source_byte_end: None, - source_line_start: None, - source_line_end: None, - properties_json, - content_hash: None, - summary_json: None, - wardline_json: None, - first_seen_commit: None, - last_seen_commit: None, - created_at: now.clone(), - updated_at: now, - }), + entity: Box::new(entity), ack, }) .await @@ -2314,6 +2849,8 @@ fn syntax_error_finding(record: &EntityRecord, run_id: &str, now: &str) -> Optio /// breaker subcode (`FINDING_DISABLED_CRASH_LOOP`): this fires per plugin crash, /// the breaker subcode fires once when the breaker trips. const INFRA_CRASH_RULE_ID: &str = "CLA-INFRA-PLUGIN-CRASH"; +const SOURCE_WALK_SKIPPED_RULE_ID: &str = "CLA-INFRA-SOURCE-WALK-SKIPPED"; +const SOURCE_WALK_ERROR_SAMPLE_LIMIT: usize = 10; /// Anchor entity id for project/plugin-level findings that are not file-scoped /// (plugin crash, OOM, protocol/ontology violations). `findings.entity_id` is @@ -2333,6 +2870,7 @@ async fn ensure_project_anchor( writer: &Writer, project_root: &Path, started_at: &str, + head_commit: Option<&str>, ) -> Result { let id = project_anchor_id(project_root); let name = project_root @@ -2341,7 +2879,7 @@ async fn ensure_project_anchor( .unwrap_or("root") .to_owned(); let properties = serde_json::json!({ "finding_anchor": true }).to_string(); - let record = EntityRecord { + let mut record = EntityRecord { id: id.clone(), plugin_id: "core".to_owned(), kind: "project".to_owned(), @@ -2355,6 +2893,7 @@ async fn ensure_project_anchor( source_line_start: None, source_line_end: None, properties_json: properties, + tags: Vec::new(), content_hash: None, summary_json: None, wardline_json: None, @@ -2363,6 +2902,7 @@ async fn ensure_project_anchor( created_at: started_at.to_owned(), updated_at: started_at.to_owned(), }; + stamp_entity_git_provenance(&mut record, head_commit); writer .send_wait(|ack| WriterCmd::InsertEntity { entity: Box::new(record), @@ -2485,11 +3025,54 @@ fn crash_finding_record( } } +fn source_walk_finding_record( + project_root: &Path, + skipped_entries: u64, + error_samples: &[String], + errors_omitted: usize, + anchor_id: &str, + run_id: &str, + now: &str, +) -> FindingRecord { + let discriminator = + blake3::hash(format!("{}\u{0}{skipped_entries}", project_root.display()).as_bytes()) + .to_hex(); + FindingRecord { + id: format!("core:finding:{run_id}:source-walk:{discriminator}"), + tool: "clarion".to_owned(), + tool_version: env!("CARGO_PKG_VERSION").to_owned(), + run_id: run_id.to_owned(), + rule_id: SOURCE_WALK_SKIPPED_RULE_ID.to_owned(), + kind: "defect".to_owned(), + severity: "WARN".to_owned(), + confidence: Some(1.0), + confidence_basis: Some("source tree walk".to_owned()), + entity_id: anchor_id.to_owned(), + related_entities_json: "[]".to_owned(), + message: format!( + "source tree walk skipped {skipped_entries} unreadable or invalid entr{}; analysis is incomplete for those paths", + if skipped_entries == 1 { "y" } else { "ies" } + ), + evidence_json: serde_json::json!({ + "project_root": project_root.display().to_string(), + "skipped_entries": skipped_entries, + "error_samples": error_samples, + "errors_omitted": errors_omitted, + }) + .to_string(), + properties_json: "{}".to_owned(), + supports_json: "[]".to_owned(), + supported_by_json: "[]".to_owned(), + created_at: now.to_owned(), + updated_at: now.to_owned(), + } +} + /// Load the MCP-side config (Filigree integration) from the same `clarion.yaml` /// `clarion serve` reads. A missing or unparseable file falls back to the /// default (Filigree disabled), so a config problem never fails the run — it /// just means no emission. -fn load_mcp_config(project_root: &Path, config_path: Option<&Path>) -> McpConfig { +pub(crate) fn load_mcp_config(project_root: &Path, config_path: Option<&Path>) -> McpConfig { let path = config_path.map_or_else(|| project_root.join("clarion.yaml"), Path::to_path_buf); if !path.exists() { return McpConfig::default(); @@ -2504,6 +3087,185 @@ fn load_mcp_config(project_root: &Path, config_path: Option<&Path>) -> McpConfig }) } +#[derive(Debug, Clone, PartialEq, Eq)] +struct SemanticEmbeddingStats { + considered: u64, + skipped_fresh: u64, + embedded: u64, + tokens_input: u64, + model_id: String, +} + +#[derive(Debug)] +struct SemanticEmbeddingCandidate { + entity_id: String, + content_hash: String, + text: String, +} + +async fn populate_semantic_embeddings( + project_root: &Path, + db_path: &Path, + config: &SemanticSearchConfig, + provider: Arc, +) -> Result { + let model_id = provider.model_id().to_owned(); + let mut stats = SemanticEmbeddingStats { + considered: 0, + skipped_fresh: 0, + embedded: 0, + tokens_input: 0, + model_id: model_id.clone(), + }; + if !config.enabled { + return Ok(stats); + } + + let conn = Connection::open(db_path) + .with_context(|| format!("open Clarion database {}", db_path.display()))?; + let store = EmbeddingStore::open_in_clarion_dir(project_root) + .map_err(|err| anyhow::anyhow!("{err}")) + .context("open semantic embedding sidecar")?; + let pending = semantic_embedding_candidates(&conn, &store, &model_id, &mut stats)?; + if pending.is_empty() { + return Ok(stats); + } + + let token_estimates: Vec = pending + .iter() + .map(|candidate| { + u32::try_from(provider.estimate_tokens(std::slice::from_ref(&candidate.text))) + .unwrap_or(u32::MAX) + }) + .collect(); + stats.tokens_input = token_estimates + .iter() + .map(|tokens| u64::from(*tokens)) + .sum(); + if stats.tokens_input > config.session_token_ceiling { + bail!( + "semantic embedding token estimate {} exceeds semantic_search.session_token_ceiling {}", + stats.tokens_input, + config.session_token_ceiling + ); + } + + let now = iso8601_now(); + for (batch_index, batch) in pending.chunks(SEMANTIC_EMBEDDING_BATCH_SIZE).enumerate() { + let texts: Vec = batch + .iter() + .map(|candidate| candidate.text.clone()) + .collect(); + let vectors = provider + .embed(&texts) + .await + .with_context(|| format!("embed {} semantic candidate(s)", texts.len()))?; + if vectors.len() != batch.len() { + bail!( + "embedding provider returned {} vectors for {} semantic candidate(s)", + vectors.len(), + batch.len() + ); + } + for (local_index, (candidate, vector)) in batch.iter().zip(vectors.iter()).enumerate() { + if vector.len() != provider.dimensions() { + bail!( + "embedding provider returned {} dims for {}; expected {}", + vector.len(), + candidate.entity_id, + provider.dimensions() + ); + } + let token_index = batch_index * SEMANTIC_EMBEDDING_BATCH_SIZE + local_index; + store + .upsert( + &EmbeddingKey { + entity_id: candidate.entity_id.clone(), + content_hash: candidate.content_hash.clone(), + model_id: model_id.clone(), + }, + vector, + 0.0, + token_estimates[token_index], + &now, + ) + .map_err(|err| anyhow::anyhow!("{err}")) + .with_context(|| { + format!("persist semantic embedding for {}", candidate.entity_id) + })?; + stats.embedded += 1; + } + } + + Ok(stats) +} + +fn semantic_embedding_candidates( + conn: &Connection, + store: &EmbeddingStore, + model_id: &str, + stats: &mut SemanticEmbeddingStats, +) -> Result> { + let mut stmt = conn + .prepare( + "SELECT id, name, short_name, properties, content_hash \ + FROM entities \ + WHERE content_hash IS NOT NULL \ + AND briefing_blocked IS NULL \ + ORDER BY id", + ) + .context("query semantic embedding candidates")?; + let rows = stmt + .query_map([], |row| { + Ok(( + row.get::<_, String>(0)?, + row.get::<_, String>(1)?, + row.get::<_, String>(2)?, + row.get::<_, String>(3)?, + row.get::<_, String>(4)?, + )) + }) + .context("read semantic embedding candidates")?; + + let mut pending = Vec::new(); + for row in rows { + let (entity_id, name, short_name, properties_json, content_hash) = + row.context("read semantic embedding candidate")?; + stats.considered += 1; + let fresh = store + .get_vector(&entity_id, &content_hash, model_id) + .map_err(|err| anyhow::anyhow!("{err}")) + .with_context(|| format!("check semantic embedding freshness for {entity_id}"))?; + if fresh.is_some() { + stats.skipped_fresh += 1; + continue; + } + pending.push(SemanticEmbeddingCandidate { + entity_id, + content_hash, + text: semantic_embedding_text(&short_name, &name, &properties_json), + }); + } + Ok(pending) +} + +fn semantic_embedding_text(short_name: &str, name: &str, properties_json: &str) -> String { + if let Ok(properties) = serde_json::from_str::(properties_json) + && let Some(docstring) = properties + .get("docstring") + .and_then(serde_json::Value::as_str) + .map(str::trim) + .filter(|docstring| !docstring.is_empty()) + { + return format!("{short_name}\n{docstring}"); + } + if name == short_name { + short_name.to_owned() + } else { + format!("{short_name}\n{name}") + } +} + /// Phase 8 (WP9-B, REQ-FINDING-03): POST this run's persisted findings to /// Filigree's native `POST /api/v1/scan-results` intake. /// @@ -2632,7 +3394,7 @@ async fn emit_findings_to_filigree( /// readable. Best-effort: a build/transport failure becomes an /// `CLA-INFRA-FILIGREE-UNREACHABLE` stats blob via [`unreachable_stats`]. async fn post_findings_batch( - filigree_cfg: &clarion_mcp::config::FiligreeConfig, + filigree_cfg: &FiligreeConfig, project_root: &Path, run_id: &str, batch: PreparedBatch, @@ -2989,24 +3751,43 @@ fn handle_plugin_task_join_result( /// Returned from the blocking plugin task on success. struct BatchResult { - /// `(entity_id_string, record)` pairs for every accepted entity. + /// Findings accumulated by the host during the session. + findings: Vec, +} + +enum PluginBatchMessage { + File(PluginFileBatch), + DeferredImportEdges { + edges: Vec<(String, EdgeRecord)>, + imports_skipped_external: u64, + }, +} + +struct PluginFileBatch { + /// `(entity_id_string, record)` pairs accepted from one analyzed file. entities: Vec<(String, EntityRecord)>, - /// `(descriptor, record)` pairs for every accepted edge — descriptor is - /// `"(kind from_id -> to_id)"` for diagnostic messages on insert failure. + /// Non-import edges accepted from one analyzed file. Import edges are + /// deferred because local-vs-external classification needs the plugin's + /// complete module set. edges: Vec<(String, EdgeRecord)>, /// Per-caller unresolved site replacements derived from authoritative - /// plugin stats for this batch. + /// plugin stats for this file. unresolved_call_sites: Vec, - /// Per-file observability stats reported by the plugin and folded by the CLI. + /// Observability stats reported by the plugin for this file. stats: BatchStats, - /// Findings accumulated by the host during the session. - findings: Vec, /// `locator -> canonical SEI signature JSON` for entities the plugin /// declared a signature for (WS1 / ADR-038). The SEI mint pass reads it as /// the move-case matcher input and persists it to `entities.signature`. signatures: BTreeMap, } +struct PersistedPluginBatch { + entity_count: u64, + prior_index_entries: Vec, + sei_descriptors: Vec, + failure_findings: Vec, +} + #[derive(Debug)] struct PluginRunError { reason: String, @@ -3026,6 +3807,124 @@ impl PluginRunError { } } +async fn persist_plugin_file_batch( + writer: &Writer, + batch: PluginFileBatch, + run_id: &str, + started_at: &str, + head_commit: Option<&str>, +) -> Result { + let entity_count = batch.entities.len() as u64; + let mut prior_index_entries = Vec::new(); + let mut sei_descriptors = Vec::new(); + let mut failure_findings = Vec::new(); + + for (id_str, mut record) in batch.entities { + // Capture the prior-index row and the SEI descriptor BEFORE `record` + // is moved into the command. `signature` (WS1) is the + // plugin-declared matcher input, now carried into both the + // prior-index snapshot and the SEI descriptor list. + let signature = batch.signatures.get(&id_str).cloned(); + let prior_entry = record + .content_hash + .clone() + .map(|body_hash| PriorIndexEntry { + locator: record.id.clone(), + body_hash, + signature: signature.clone(), + }); + // Every accepted entity gets a descriptor (even ones with no body + // hash — they still carry/mint an SEI on the locator-unchanged path; + // only the move case needs a body). + let descriptor = NewEntityDescriptor { + locator: record.id.clone(), + body_hash: record.content_hash.clone(), + signature, + }; + // REQ-ANALYZE-06: capture a parse-failure finding from the degraded + // entity BEFORE `record` is moved into the command. Anchors to this + // same entity (inserted just below), so the finding's FK resolves. + if let Some(finding) = syntax_error_finding(&record, run_id, started_at) { + failure_findings.push(finding); + } + stamp_entity_git_provenance(&mut record, head_commit); + writer + .send_wait(|ack| WriterCmd::InsertEntity { + entity: Box::new(record), + ack, + }) + .await + .map_err(|e| anyhow::anyhow!("{e}")) + .with_context(|| format!("InsertEntity for {id_str}"))?; + // Recorded only after a successful insert so neither the snapshot nor + // the SEI pass claims an entity the durable graph lacks. + if let Some(prior_entry) = prior_entry { + prior_index_entries.push(prior_entry); + } + sei_descriptors.push(descriptor); + } + + for pending in batch.unresolved_call_sites { + let caller_id = pending.caller_entity_id.clone(); + writer + .send_wait(|ack| WriterCmd::ReplaceUnresolvedCallSitesForCaller { + caller_entity_id: pending.caller_entity_id, + caller_content_hash: pending.caller_content_hash, + sites: pending.sites, + ack, + }) + .await + .map_err(|e| anyhow::anyhow!("{e}")) + .with_context(|| format!("ReplaceUnresolvedCallSitesForCaller for {caller_id}"))?; + } + + Ok(PersistedPluginBatch { + entity_count, + prior_index_entries, + sei_descriptors, + failure_findings, + }) +} + +fn stamp_entity_git_provenance(record: &mut EntityRecord, head_commit: Option<&str>) { + if let Some(commit) = head_commit { + record.first_seen_commit = Some(commit.to_owned()); + record.last_seen_commit = Some(commit.to_owned()); + } +} + +async fn persist_plugin_edges(writer: &Writer, edges: Vec<(String, EdgeRecord)>) -> Result { + let edge_count = edges.len() as u64; + for (descr, record) in edges { + writer + .send_wait(|ack| WriterCmd::InsertEdge { + edge: Box::new(record), + ack, + }) + .await + .map_err(|e| anyhow::anyhow!("{e}")) + .with_context(|| format!("InsertEdge {descr}"))?; + } + Ok(edge_count) +} + +fn drain_ready_plugin_edges( + pending_edges: &mut Vec, + seen_entity_ids: &BTreeSet, +) -> Vec { + let mut ready = Vec::new(); + let mut waiting = Vec::new(); + for (descr, edge) in pending_edges.drain(..) { + if seen_entity_ids.contains(&edge.from_id) && seen_entity_ids.contains(&edge.to_id) { + ready.push((descr, edge)); + } else { + waiting.push((descr, edge)); + } + } + *pending_edges = waiting; + ready +} + #[derive(Debug, Default)] struct BatchStats { unresolved_call_sites_total: u64, @@ -3047,16 +3946,6 @@ struct PendingUnresolvedCallSites { sites: Vec, } -type Collected = ( - Vec<(String, EntityRecord)>, - Vec<(String, EdgeRecord)>, - Vec, - BatchStats, - // locator -> canonical SEI signature JSON (WS1). Only entities the plugin - // declared a signature for appear; absent ⇒ null signature. - BTreeMap, -); - /// Per-file analysis-timeout watchdog (REQ-ANALYZE-06, `CLA-PY-TIMEOUT`). /// /// `analyze_file` blocks on a synchronous read of the plugin's stdout, which has @@ -3159,6 +4048,7 @@ fn run_plugin_blocking( scanned_source_files: &Arc>, progress: &ProgressReporter, file_timeout: std::time::Duration, + batch_tx: &tokio::sync::mpsc::Sender, ) -> Result { use clarion_core::PluginHost; @@ -3188,47 +4078,43 @@ fn run_plugin_blocking( plugin_id.to_owned(), ); - let work_result: Result = (|| { - let mut collected_entities: Vec<(String, EntityRecord)> = Vec::new(); - let mut collected_edges: Vec<(String, EdgeRecord)> = Vec::new(); - let mut collected_unresolved_call_sites: Vec = Vec::new(); - let mut collected_stats = BatchStats::default(); - let mut collected_signatures: BTreeMap = BTreeMap::new(); + let work_result: Result<(), String> = (|| { + let mut module_entity_ids: BTreeSet = BTreeSet::new(); + let mut deferred_import_edges: Vec<(String, EdgeRecord)> = Vec::new(); for file in files { - progress.file_started(plugin_id, &file.to_string_lossy()); + let file_display = file.to_string_lossy().into_owned(); + progress.file_started(plugin_id, &file_display); + let heartbeat_guard = progress.file_heartbeat_guard(plugin_id.to_owned(), file_display); watchdog.arm(file_timeout); let analyze_outcome = host.analyze_file(file); watchdog.disarm(); + drop(heartbeat_guard); let AnalyzeFileOutcome { entities, edges, stats, } = analyze_outcome.map_err(|e| classify_host_error(plugin_id, e))?; progress.file_completed(); - collected_stats.unresolved_call_sites_total += stats.unresolved_call_sites_total; - collected_stats.reference_sites_total += stats.reference_sites_total; - collected_stats.references_resolved_total += stats.references_resolved_total; - collected_stats.references_skipped_external_total += - stats.references_skipped_external_total; - collected_stats.references_skipped_cap_total += stats.references_skipped_cap_total; - collected_stats.unresolved_reference_sites_total += - stats.unresolved_reference_sites_total; - collected_stats - .pyright_query_latency_ms - .extend(stats.pyright_query_latency_ms.iter().copied()); - collected_stats - .pyright_index_parse_latency_ms - .extend(stats.pyright_index_parse_latency_ms.iter().copied()); + let mut file_stats = BatchStats { + unresolved_call_sites_total: stats.unresolved_call_sites_total, + reference_sites_total: stats.reference_sites_total, + references_resolved_total: stats.references_resolved_total, + references_skipped_external_total: stats.references_skipped_external_total, + references_skipped_cap_total: stats.references_skipped_cap_total, + imports_skipped_external_total: 0, + unresolved_reference_sites_total: stats.unresolved_reference_sites_total, + pyright_query_latency_ms: stats.pyright_query_latency_ms.clone(), + pyright_index_parse_latency_ms: stats.pyright_index_parse_latency_ms.clone(), + extractor_parse_latency_ms: Vec::new(), + }; if stats.extractor_parse_latency_ms > 0 { - collected_stats + file_stats .extractor_parse_latency_ms .push(stats.extractor_parse_latency_ms); } - let source_file_id = entities - .iter() - .find(|entity| entity.kind == "module") - .map(|entity| entity.id.to_string()); let mut file_entities: Vec<(String, EntityRecord)> = Vec::new(); + let mut file_edges: Vec<(String, EdgeRecord)> = Vec::new(); + let mut file_signatures: BTreeMap = BTreeMap::new(); let (file_entity_id, file_record) = core_file_entity_record( project_root, file, @@ -3237,19 +4123,34 @@ fn run_plugin_blocking( scanned_source_files, ) .map_err(|e| format!("core file entity for {}: {e:#}", file.display()))?; - file_entities.push((file_entity_id.clone(), file_record.clone())); - collected_entities.push((file_entity_id, file_record)); + file_entities.push((file_entity_id.clone(), file_record)); for entity in &entities { let id_str = entity.id.to_string(); // Capture the plugin-declared SEI signature (ADR-038 REQ-C-01), // canonicalised for stable string-equality comparison. The core // never interprets the JSON — it only re-serialises the value. if let Some(sig) = &entity.raw.signature { - collected_signatures.insert(id_str.clone(), canonical_signature(sig)); + file_signatures.insert(id_str.clone(), canonical_signature(sig)); + } + let mut record = map_entity_to_record( + project_root, + entity, + plugin_id, + Some(file_entity_id.clone()), + ); + if entity.kind == "module" { + module_entity_ids.insert(id_str.clone()); + record.parent_id = Some(file_entity_id.clone()); + file_edges.push(( + format!( + "(contains {from} -> {to})", + from = file_entity_id, + to = entity.id + ), + core_file_contains_edge(&file_entity_id, entity.id.as_str()), + )); } - let record = map_entity_to_record(entity, plugin_id, source_file_id.clone()); file_entities.push((id_str.clone(), record.clone())); - collected_entities.push((id_str, record)); } let unresolved_for_file = map_unresolved_call_sites_for_file(&stats, &file_entities, &iso8601_now()) @@ -3258,7 +4159,6 @@ fn run_plugin_blocking( "plugin {plugin_id} emitted invalid unresolved call-site stats: {e:#}" ) })?; - collected_unresolved_call_sites.extend(unresolved_for_file); for edge in edges { let descr = format!( "({kind} {from} -> {to})", @@ -3266,19 +4166,32 @@ fn run_plugin_blocking( from = edge.from_id, to = edge.to_id, ); - let record = map_edge_to_record(edge); - collected_edges.push((descr, record)); + let record = map_edge_to_record(edge, Some(file_entity_id.clone())); + file_edges.push((descr, record)); } + let (immediate_edges, import_edges) = split_deferred_import_edges(file_edges); + deferred_import_edges.extend(import_edges); + batch_tx + .blocking_send(PluginBatchMessage::File(PluginFileBatch { + entities: file_entities, + edges: immediate_edges, + unresolved_call_sites: unresolved_for_file, + stats: file_stats, + signatures: file_signatures, + })) + .map_err(|_| "plugin batch receiver closed".to_owned())?; } - collected_stats.imports_skipped_external_total += - filter_external_import_edges(&collected_entities, &mut collected_edges); - Ok(( - collected_entities, - collected_edges, - collected_unresolved_call_sites, - collected_stats, - collected_signatures, - )) + let imports_skipped_external = filter_external_import_edges_by_module_ids( + &module_entity_ids, + &mut deferred_import_edges, + ); + batch_tx + .blocking_send(PluginBatchMessage::DeferredImportEdges { + edges: deferred_import_edges, + imports_skipped_external, + }) + .map_err(|_| "plugin batch receiver closed".to_owned())?; + Ok(()) })(); // Stop and join the watchdog before reaping so it no longer holds the child @@ -3348,14 +4261,7 @@ fn run_plugin_blocking( reap_and_classify_exit(&mut child, plugin_id, &mut findings); match work_result { - Ok((entities, edges, unresolved_call_sites, stats, signatures)) => Ok(BatchResult { - entities, - edges, - unresolved_call_sites, - stats, - findings, - signatures, - }), + Ok(()) => Ok(BatchResult { findings }), Err(reason) => Err(PluginRunError::with_findings(reason, findings)), } } @@ -3507,6 +4413,7 @@ fn classify_host_error(plugin_id: &str, e: HostError) -> String { } } +#[cfg(test)] fn filter_external_import_edges( entities: &[(String, EntityRecord)], edges: &mut Vec<(String, EdgeRecord)>, @@ -3516,13 +4423,28 @@ fn filter_external_import_edges( .filter(|(_, record)| record.kind == "module") .map(|(id, _)| id.as_str()) .collect(); + filter_external_import_edges_by_module_refs(&module_entity_ids, edges) +} + +fn filter_external_import_edges_by_module_ids( + module_entity_ids: &BTreeSet, + edges: &mut Vec<(String, EdgeRecord)>, +) -> u64 { + let module_entity_ids: BTreeSet<&str> = module_entity_ids.iter().map(String::as_str).collect(); + filter_external_import_edges_by_module_refs(&module_entity_ids, edges) +} + +fn filter_external_import_edges_by_module_refs( + module_entity_ids: &BTreeSet<&str>, + edges: &mut Vec<(String, EdgeRecord)>, +) -> u64 { let before = edges.len(); edges.retain_mut(|(_, edge)| { if edge.kind != "imports" { return true; } if let Some(local_submodule) = - absolute_from_import_submodule_target(edge, &module_entity_ids) + absolute_from_import_submodule_target(edge, module_entity_ids) { edge.to_id = local_submodule; return true; @@ -3532,6 +4454,14 @@ fn filter_external_import_edges( u64::try_from(before - edges.len()).unwrap_or(u64::MAX) } +fn split_deferred_import_edges( + edges: Vec, +) -> (Vec, Vec) { + edges + .into_iter() + .partition(|(_, edge)| edge.kind != "imports") +} + fn absolute_from_import_submodule_target( edge: &EdgeRecord, module_entity_ids: &BTreeSet<&str>, @@ -3601,7 +4531,7 @@ fn core_file_entity_record( .and_then(|name| name.to_str()) .unwrap_or(&qualified_name) .to_owned(); - let content_hash = whole_file_hash(Path::new(&source_file_path)) + let content_hash = whole_file_hash(&canonical_root, Path::new(&source_file_path)) .with_context(|| format!("read source file {source_file_path}"))?; let mut properties = serde_json::Map::new(); properties.insert( @@ -3633,6 +4563,7 @@ fn core_file_entity_record( source_line_start: None, source_line_end: None, properties_json, + tags: Vec::new(), content_hash: Some(content_hash), summary_json: None, wardline_json: None, @@ -3675,6 +4606,7 @@ fn project_relative_posix(path: &Path) -> Result { /// Map an `AcceptedEntity` to an `EntityRecord` for the writer-actor. fn map_entity_to_record( + project_root: &Path, entity: &AcceptedEntity, plugin_id: &str, source_file_id: Option, @@ -3706,7 +4638,8 @@ fn map_entity_to_record( source_line_start: source_line_range.map(|range| range.start_line), source_line_end: source_line_range.map(|range| range.end_line), properties_json, - content_hash: content_hash_for_entity(entity, source_line_range), + tags: normalised_entity_tags(&entity.raw.tags), + content_hash: content_hash_for_entity(project_root, entity, source_line_range), summary_json: None, wardline_json: None, first_seen_commit: None, @@ -3716,6 +4649,16 @@ fn map_entity_to_record( } } +fn normalised_entity_tags(tags: &[String]) -> Vec { + tags.iter() + .map(|tag| tag.trim()) + .filter(|tag| !tag.is_empty()) + .map(str::to_owned) + .collect::>() + .into_iter() + .collect() +} + #[derive(Debug, Clone, Copy)] struct SourceLineRange { start_line: i64, @@ -3742,8 +4685,11 @@ fn source_line_range(entity: &AcceptedEntity) -> Option { /// incremental-skip check. They MUST agree byte-for-byte or the skip silently /// never matches; one helper guarantees that. `None` when the file cannot be /// read — callers fail toward re-analysis. -fn whole_file_hash(path: &Path) -> Option { - let bytes = fs::read(path).ok()?; +fn whole_file_hash(project_root: &Path, path: &Path) -> Option { + use std::io::Read; + let mut file = clarion_core::plugin::jail::safe_open(project_root, path).ok()?; + let mut bytes = Vec::new(); + file.read_to_end(&mut bytes).ok()?; Some(blake3::hash(&bytes).to_hex().to_string()) } @@ -3764,29 +4710,40 @@ fn canonical_path_key(path: &Path) -> Option { /// fail-toward-work direction — on any uncertainty: the path cannot be /// canonicalised, the prior run recorded no whole-file hash for it (a new file), /// or the file is unhashable now. Skips only on a confident byte-identical match. -fn file_needs_reanalysis(path: &Path, prior_file_hashes: &HashMap) -> bool { +fn file_needs_reanalysis( + project_root: &Path, + path: &Path, + prior_file_hashes: &HashMap, +) -> bool { let Some(key) = canonical_path_key(path) else { return true; }; let Some(prior) = prior_file_hashes.get(&key) else { return true; }; - match whole_file_hash(path) { + match whole_file_hash(project_root, path) { Some(current) => ¤t != prior, None => true, } } fn content_hash_for_entity( + project_root: &Path, entity: &AcceptedEntity, source_line_range: Option, ) -> Option { + use std::io::Read; + if entity.kind == "module" { - return whole_file_hash(Path::new(&entity.source_file_path)); + return whole_file_hash(project_root, Path::new(&entity.source_file_path)); } let range = source_line_range?; - let source = fs::read_to_string(&entity.source_file_path).ok()?; + let mut file = + clarion_core::plugin::jail::safe_open(project_root, Path::new(&entity.source_file_path)) + .ok()?; + let mut source = String::new(); + file.read_to_string(&mut source).ok()?; let lines: Vec<&str> = source.lines().collect(); let start = usize::try_from(range.start_line - 1).ok()?; let mut end = usize::try_from(range.end_line).ok()?; @@ -3808,7 +4765,20 @@ fn canonical_signature(value: &serde_json::Value) -> String { } /// Map an `AcceptedEdge` to an `EdgeRecord` for the writer-actor (B.3). -fn map_edge_to_record(edge: AcceptedEdge) -> EdgeRecord { +fn core_file_contains_edge(file_entity_id: &str, child_entity_id: &str) -> EdgeRecord { + EdgeRecord { + kind: "contains".to_owned(), + from_id: file_entity_id.to_owned(), + to_id: child_entity_id.to_owned(), + confidence: EdgeConfidence::Resolved, + properties_json: None, + source_file_id: Some(file_entity_id.to_owned()), + source_byte_start: None, + source_byte_end: None, + } +} + +fn map_edge_to_record(edge: AcceptedEdge, source_file_id: Option) -> EdgeRecord { let properties_json = edge .raw .properties @@ -3820,7 +4790,7 @@ fn map_edge_to_record(edge: AcceptedEdge) -> EdgeRecord { to_id: edge.to_id, confidence: edge.confidence, properties_json, - source_file_id: edge.source_file_id, + source_file_id, source_byte_start: edge.raw.source_byte_start, source_byte_end: edge.raw.source_byte_end, } @@ -3954,6 +4924,12 @@ const SKIP_DIRS: &[&str] = &[ /// Collect all source files under `root` whose extension is in `wanted`. /// +#[derive(Debug, Default)] +struct SourceWalkResult { + files: Vec, + skipped_errors: Vec, +} + /// Uses the `ignore` crate so `.gitignore` / `.ignore` / global gitignore /// policy filters the source set before plugin dispatch. Symlinks are skipped /// (path-jail concerns for Sprint 1). @@ -3964,9 +4940,9 @@ const SKIP_DIRS: &[&str] = &[ /// the operator can see that the file list is incomplete — silently /// dropping those entries would mask the same "incomplete analysis" /// class that the WP1 `read_applied_versions` `.ok()` pattern did. -fn collect_source_files(root: &Path, wanted_extensions: &BTreeSet) -> Vec { +fn collect_source_files(root: &Path, wanted_extensions: &BTreeSet) -> SourceWalkResult { let mut out = Vec::new(); - let mut skipped: u64 = 0; + let mut skipped_errors = Vec::new(); let mut builder = WalkBuilder::new(root); builder .follow_links(false) @@ -3997,16 +4973,18 @@ fn collect_source_files(root: &Path, wanted_extensions: &BTreeSet) -> Ve } } Err(err) => { + let message = err.to_string(); tracing::warn!( - error = %err, + error = %message, "source walk: skipping unreadable or ignored-path-error entry", ); - skipped += 1; + skipped_errors.push(message); } } } - if skipped > 0 { + if !skipped_errors.is_empty() { + let skipped = skipped_errors.len(); tracing::warn!( skipped = skipped, root = %root.display(), @@ -4015,7 +4993,10 @@ fn collect_source_files(root: &Path, wanted_extensions: &BTreeSet) -> Ve suffix = if skipped == 1 { "y" } else { "ies" }, ); } - out + SourceWalkResult { + files: out, + skipped_errors, + } } fn is_skipped_dir(entry: &DirEntry) -> bool { @@ -4135,6 +5116,33 @@ mod tests { assert!(snapshot["current_plugin"].is_null()); } + #[test] + fn progress_reporter_refreshes_heartbeat_for_in_flight_file() { + let dir = tempfile::tempdir().expect("tempdir"); + let path = dir.path().join("runs").join("run-1.progress.json"); + fs::create_dir_all(path.parent().unwrap()).unwrap(); + let reporter = ProgressReporter::new(Some(path.clone()), "run-1".to_owned()); + + reporter.file_started("python", "src/slow.py"); + let before: serde_json::Value = + serde_json::from_str(&fs::read_to_string(&path).expect("progress file")).unwrap(); + std::thread::sleep(std::time::Duration::from_millis(5)); + let guard = reporter.file_heartbeat_guard_with_interval( + "python".to_owned(), + "src/slow.py".to_owned(), + std::time::Duration::from_millis(10), + ); + std::thread::sleep(std::time::Duration::from_millis(35)); + drop(guard); + + let after: serde_json::Value = + serde_json::from_str(&fs::read_to_string(&path).expect("progress file")).unwrap(); + assert_eq!(after["phase"], "analyzing"); + assert_eq!(after["current_plugin"], "python"); + assert_eq!(after["current_file"], "src/slow.py"); + assert_ne!(before["heartbeat_at"], after["heartbeat_at"]); + } + #[test] fn subsystem_entity_id_rejects_invalid_hash_segment() { let err = subsystem_entity_id("bad:hash").expect_err("colon must be rejected"); @@ -4159,7 +5167,11 @@ mod tests { .expect("ignored dir source"); let wanted = BTreeSet::from(["py".to_owned()]); - let mut files = collect_source_files(root, &wanted); + let SourceWalkResult { + mut files, + skipped_errors, + } = collect_source_files(root, &wanted); + assert!(skipped_errors.is_empty()); files.sort(); let relative = files .into_iter() @@ -4174,6 +5186,43 @@ mod tests { assert_eq!(relative, vec!["kept.py"]); } + #[test] + fn source_walk_returns_errors_instead_of_only_logging_them() { + let tempdir = tempfile::tempdir().expect("tempdir"); + let missing_root = tempdir.path().join("missing"); + let wanted = BTreeSet::from(["py".to_owned()]); + + let result = collect_source_files(&missing_root, &wanted); + + assert!(result.files.is_empty()); + assert!( + !result.skipped_errors.is_empty(), + "missing root must be carried as a skipped walk error" + ); + } + + #[test] + fn source_walk_finding_record_is_project_anchored_with_samples() { + let rec = source_walk_finding_record( + Path::new("/tmp/project"), + 2, + &["permission denied".to_owned()], + 1, + "core:project:project", + "run-1", + "2026-06-04T00:00:00.000Z", + ); + + assert_eq!(rec.rule_id, SOURCE_WALK_SKIPPED_RULE_ID); + assert_eq!(rec.severity, "WARN"); + assert_eq!(rec.entity_id, "core:project:project"); + let evidence: serde_json::Value = + serde_json::from_str(&rec.evidence_json).expect("evidence json"); + assert_eq!(evidence["skipped_entries"], 2); + assert_eq!(evidence["error_samples"][0], "permission denied"); + assert_eq!(evidence["errors_omitted"], 1); + } + #[test] fn filter_import_edges_prefers_absolute_from_import_submodule_when_local() { let entities = vec![ @@ -4306,6 +5355,7 @@ mod tests { source_line_start: None, source_line_end: None, properties_json: "{}".to_owned(), + tags: Vec::new(), content_hash: None, summary_json: None, wardline_json: None, @@ -4332,6 +5382,7 @@ mod tests { source_line_start: None, source_line_end: None, properties_json: properties_json.to_owned(), + tags: Vec::new(), content_hash: None, summary_json: None, wardline_json: None, @@ -4558,12 +5609,7 @@ mod tests { #[test] fn handle_task_passes_through_ok_ok() { let br = BatchResult { - entities: Vec::new(), - edges: Vec::new(), - unresolved_call_sites: Vec::new(), - stats: BatchStats::default(), findings: Vec::new(), - signatures: BTreeMap::new(), }; let out = handle_plugin_task_join_result(Ok(Ok(br)), "python"); assert!(out.is_ok()); @@ -4672,19 +5718,30 @@ mod tests { signature: Some( serde_json::json!({"v": 1, "params": ["x: int"], "return_ann": "bool"}), ), + tags: vec![ + "entry-point".to_owned(), + "entry-point".to_owned(), + " ".to_owned(), + ], extra: serde_json::Map::new(), }, }; - let record = map_entity_to_record(&entity, "python", Some("python:module:demo".to_owned())); + let record = map_entity_to_record( + tempdir.path(), + &entity, + "python", + Some("core:file:demo.py".to_owned()), + ); assert_eq!( record.source_file_path.as_deref(), Some(source_path.to_str().unwrap()) ); - assert_eq!(record.source_file_id.as_deref(), Some("python:module:demo")); + assert_eq!(record.source_file_id.as_deref(), Some("core:file:demo.py")); assert_eq!(record.source_line_start, Some(1)); assert_eq!(record.source_line_end, Some(2)); + assert_eq!(record.tags, vec!["entry-point".to_owned()]); let expected_hash = blake3::hash("def hello():\n return 'hé'".as_bytes()) .to_hex() .to_string(); @@ -4700,13 +5757,14 @@ mod tests { name: "demo.caller".to_owned(), short_name: "caller".to_owned(), parent_id: Some("python:module:demo".to_owned()), - source_file_id: Some("python:module:demo".to_owned()), + source_file_id: Some("core:file:demo.py".to_owned()), source_file_path: Some("demo.py".to_owned()), source_byte_start: None, source_byte_end: None, source_line_start: Some(1), source_line_end: Some(3), properties_json: "{}".to_owned(), + tags: Vec::new(), content_hash: Some("hash-python:function:demo.caller".to_owned()), summary_json: None, wardline_json: None, @@ -4751,7 +5809,7 @@ mod tests { assert_eq!(mapped[0].sites.len(), 1); assert_eq!( mapped[0].sites[0].source_file_id.as_deref(), - Some("python:module:demo") + Some("core:file:demo.py") ); assert_eq!(mapped[0].sites[0].callee_expr, "dynamic_target"); assert_eq!( @@ -4769,13 +5827,14 @@ mod tests { name: "demo.caller".to_owned(), short_name: "caller".to_owned(), parent_id: Some("python:module:demo".to_owned()), - source_file_id: Some("python:module:demo".to_owned()), + source_file_id: Some("core:file:demo.py".to_owned()), source_file_path: Some("demo.py".to_owned()), source_byte_start: None, source_byte_end: None, source_line_start: Some(1), source_line_end: Some(3), properties_json: "{}".to_owned(), + tags: Vec::new(), content_hash: Some("hash-python:function:demo.caller".to_owned()), summary_json: None, wardline_json: None, @@ -4795,4 +5854,126 @@ mod tests { assert_eq!(mapped[0].caller_entity_id, "python:function:demo.caller"); assert!(mapped[0].sites.is_empty()); } + + #[tokio::test] + async fn semantic_embedding_population_skips_fresh_sidecar_rows() { + use std::sync::Arc; + + use clarion_core::{EmbeddingProvider, EmbeddingRecording, RecordingEmbeddingProvider}; + use clarion_federation::config::SemanticSearchConfig; + use clarion_storage::{EmbeddingKey, EmbeddingStore, pragma, schema}; + + let project = tempfile::tempdir().unwrap(); + std::fs::create_dir(project.path().join(".clarion")).unwrap(); + let db_path = project.path().join(".clarion/clarion.db"); + let mut conn = rusqlite::Connection::open(&db_path).unwrap(); + pragma::apply_write_pragmas(&conn).unwrap(); + schema::apply_migrations(&mut conn).unwrap(); + conn.execute( + "INSERT INTO entities \ + (id, plugin_id, kind, name, short_name, properties, content_hash, created_at, updated_at) \ + VALUES \ + ('python:function:demo.fresh', 'python', 'function', 'demo.fresh', 'fresh', \ + '{\"docstring\":\"already embedded\"}', 'hash-fresh', 't', 't')", + [], + ) + .unwrap(); + drop(conn); + + let store = EmbeddingStore::open_in_clarion_dir(project.path()).unwrap(); + store + .upsert( + &EmbeddingKey { + entity_id: "python:function:demo.fresh".to_owned(), + content_hash: "hash-fresh".to_owned(), + model_id: "test-model".to_owned(), + }, + &[1.0, 0.0], + 0.0, + 2, + "t", + ) + .unwrap(); + drop(store); + + let provider = Arc::new(RecordingEmbeddingProvider::from_recordings( + "test-model", + 2, + Vec::::new(), + )); + let stats = populate_semantic_embeddings( + project.path(), + &db_path, + &SemanticSearchConfig { + enabled: true, + model_id: "test-model".to_owned(), + dimensions: 2, + ..SemanticSearchConfig::default() + }, + provider.clone() as Arc, + ) + .await + .unwrap(); + + assert_eq!(stats.considered, 1); + assert_eq!(stats.skipped_fresh, 1); + assert_eq!(stats.embedded, 0); + assert!( + provider.invocations().is_empty(), + "fresh sidecar rows must not be re-embedded" + ); + } + + #[tokio::test] + async fn semantic_embedding_population_skips_briefing_blocked_entities() { + use std::sync::Arc; + + use clarion_core::{EmbeddingProvider, EmbeddingRecording, RecordingEmbeddingProvider}; + use clarion_federation::config::SemanticSearchConfig; + use clarion_storage::{pragma, schema}; + + let project = tempfile::tempdir().unwrap(); + std::fs::create_dir(project.path().join(".clarion")).unwrap(); + let db_path = project.path().join(".clarion/clarion.db"); + let mut conn = rusqlite::Connection::open(&db_path).unwrap(); + pragma::apply_write_pragmas(&conn).unwrap(); + schema::apply_migrations(&mut conn).unwrap(); + conn.execute( + "INSERT INTO entities \ + (id, plugin_id, kind, name, short_name, properties, content_hash, created_at, updated_at) \ + VALUES \ + ('python:function:demo.secret', 'python', 'function', 'demo.secret', 'secret', \ + '{\"docstring\":\"SECRET_TOKEN=abc123\", \"briefing_blocked\":\"secret_present\"}', \ + 'hash-secret', 't', 't')", + [], + ) + .unwrap(); + drop(conn); + + let provider = Arc::new(RecordingEmbeddingProvider::from_recordings( + "test-model", + 2, + Vec::::new(), + )); + let stats = populate_semantic_embeddings( + project.path(), + &db_path, + &SemanticSearchConfig { + enabled: true, + model_id: "test-model".to_owned(), + dimensions: 2, + ..SemanticSearchConfig::default() + }, + provider.clone() as Arc, + ) + .await + .unwrap(); + + assert_eq!(stats.considered, 0); + assert_eq!(stats.embedded, 0); + assert!( + provider.invocations().is_empty(), + "briefing-blocked docstrings must not be sent to the embedding provider" + ); + } } diff --git a/crates/clarion-cli/src/cli.rs b/crates/clarion-cli/src/cli.rs index d36095d2..d6946fb6 100644 --- a/crates/clarion-cli/src/cli.rs +++ b/crates/clarion-cli/src/cli.rs @@ -144,6 +144,13 @@ pub enum Command { command: DbCommand, }, + /// Author guidance sheets — institutional knowledge attached to entities + /// that the MCP read path composes into briefings (REQ-GUIDANCE-03). + Guidance { + #[command(subcommand)] + command: GuidanceCommand, + }, + /// Verify (and optionally repair) the installed agent-orientation surfaces: /// the `clarion-workflow` skill pack, the `SessionStart` hook, and the /// `.mcp.json` MCP registration. Prints a per-surface report plus the index @@ -159,6 +166,12 @@ pub enum Command { #[arg(long)] fix: bool, }, + + /// Import external findings in SARIF format and post them to Filigree. + Sarif { + #[command(subcommand)] + command: SarifCommand, + }, } #[derive(Subcommand)] @@ -181,6 +194,149 @@ pub enum DbCommand { }, } +#[derive(Subcommand)] +pub enum GuidanceCommand { + /// Create a new guidance sheet (`kind: guidance`, provenance: manual). + /// + /// `--match` syntax is `:` (split on the first colon): + /// `path:`, `tag:`, `kind:`, `subsystem:`, + /// `entity:`. Content comes from `--content`, else stdin (when + /// piped) or `$EDITOR`/`$VISUAL`. + Create { + /// Project directory containing .clarion/clarion.db (default: current). + #[arg(long, default_value = ".")] + path: PathBuf, + + /// A match rule (`:`); repeatable. + #[arg(long = "match", value_name = "RULE")] + r#match: Vec, + + /// Scope level: project | subsystem | package | module | class | function. + #[arg(long, value_name = "LEVEL")] + scope_level: String, + + /// Guidance text (markdown). Omit to author via stdin or $EDITOR. + #[arg(long)] + content: Option, + + /// Slug for the entity id's third segment (`core:guidance:`). + /// Defaults to a slug derived from the first match rule. + #[arg(long)] + name: Option, + + /// Mark the sheet pinned (preserved under token-budget pressure). + #[arg(long)] + pinned: bool, + + /// Optional expiry. Accepts an ISO-8601 instant (e.g. + /// `2026-12-31T23:59:59Z`), an offset form (converted to UTC), or a bare + /// date (e.g. `2026-12-31`, taken as start-of-day UTC). Stored + /// normalized to UTC so the read path's lexical expiry compare is + /// correct; unparseable input is rejected. + #[arg(long, value_name = "WHEN")] + expires: Option, + }, + + /// Edit a sheet's content in `$EDITOR`/`$VISUAL` (other properties, including + /// `authored_at` and provenance, are preserved). + Edit { + /// Project directory containing .clarion/clarion.db (default: current). + #[arg(long, default_value = ".")] + path: PathBuf, + /// The guidance sheet id (`core:guidance:`). + id: String, + }, + + /// Print a guidance sheet (human-readable). + Show { + /// Project directory containing .clarion/clarion.db (default: current). + #[arg(long, default_value = ".")] + path: PathBuf, + /// The guidance sheet id. + id: String, + }, + + /// List guidance sheets, ordered by `scope_rank` (project → function). + /// + /// `--expired` and `--stale` are independent filters that compose by + /// intersection (AND): a sheet is shown only if it passes every active + /// filter (including `--for-entity`). Without any of them, behaves as the + /// plain list. + List { + /// Project directory containing .clarion/clarion.db (default: current). + #[arg(long, default_value = ".")] + path: PathBuf, + /// Only list sheets whose `match_rules` apply to this entity id. + #[arg(long, value_name = "ENTITY_ID")] + for_entity: Option, + /// Only show sheets whose `expires` instant is in the past (mirrors the + /// read path's expiry exclusion). Sheets with no `expires` are excluded. + #[arg(long)] + expired: bool, + /// Only show sheets not "touched" (the later of `reviewed_at` / + /// `authored_at`) within `--days`. This is the review-cadence/age signal + /// (system-design §7.741), NOT the churn-based staleness finding. + #[arg(long)] + stale: bool, + /// Staleness window in days for `--stale` (default: 90). Ignored without + /// `--stale`. + #[arg(long, value_name = "N", default_value_t = 90)] + days: u32, + }, + + /// Delete a guidance sheet. + Delete { + /// Project directory containing .clarion/clarion.db (default: current). + #[arg(long, default_value = ".")] + path: PathBuf, + /// The guidance sheet id. + id: String, + }, + + /// Promote a reviewed Filigree guidance-proposal observation into a local + /// guidance sheet. The observation must have been produced by MCP + /// `propose_guidance`; arbitrary observations are rejected. + Promote { + /// Project directory containing .clarion/clarion.db (default: current). + #[arg(long, default_value = ".")] + path: PathBuf, + /// Path to clarion.yaml (default: project-root/clarion.yaml if present). + #[arg(long)] + config: Option, + /// The Filigree observation id to promote. + observation_id: String, + }, + + /// Export every guidance sheet to a directory as one deterministic, + /// diff-friendly JSON file per sheet, for committing to a shared repo + /// (REQ-GUIDANCE-06). Output is byte-stable across runs on identical DB + /// state. The target directory is created if absent. + Export { + /// Project directory containing .clarion/clarion.db (default: current). + #[arg(long, default_value = ".")] + path: PathBuf, + /// Directory to write the exported sheet files into. Export does NOT + /// prune: a sheet deleted locally keeps its file here, and a teammate's + /// additive `import` would resurrect it. To mirror, clear the directory + /// before exporting. + #[arg(long)] + to: PathBuf, + }, + + /// Import guidance sheets from a directory of exported JSON files + /// (REQ-GUIDANCE-06). Additive: each sheet is upserted by id, preserving ids + /// exactly; existing local sheets not present in the directory are left + /// untouched (never a destructive mirror). A malformed `*.json` aborts the + /// import naming the offending file (a dropped sheet is silent data loss). + Import { + /// Project directory containing .clarion/clarion.db (default: current). + #[arg(long, default_value = ".")] + path: PathBuf, + /// Directory of exported sheet files to import. + dir: PathBuf, + }, +} + #[derive(Subcommand)] pub enum HookCommand { /// Print a project snapshot and re-sync the skill pack on drift. @@ -190,3 +346,21 @@ pub enum HookCommand { path: PathBuf, }, } + +#[derive(Subcommand)] +pub enum SarifCommand { + /// Translate SARIF findings and post them to Filigree. + Import { + /// The SARIF file path to import. + file: PathBuf, + + /// Scan source name to tag the findings (e.g. wardline, semgrep, codeql). + /// If omitted, defaults to the driver name from the SARIF file. + #[arg(long)] + scan_source: Option, + + /// Project directory containing .clarion/clarion.db (default: current). + #[arg(long, default_value = ".")] + path: PathBuf, + }, +} diff --git a/crates/clarion-cli/src/guidance.rs b/crates/clarion-cli/src/guidance.rs new file mode 100644 index 00000000..dec40653 --- /dev/null +++ b/crates/clarion-cli/src/guidance.rs @@ -0,0 +1,1042 @@ +//! `clarion guidance` authoring subcommands (WS6 / REQ-GUIDANCE-03). +//! +//! Operator-facing CLI to create, edit, show, list, and delete guidance sheets +//! — the institutional-knowledge entities (`kind = 'guidance'`) the MCP read +//! path composes into briefings. All SQL lives in `clarion-storage` +//! (`clarion_storage::guidance`); this module owns only argument parsing, the +//! `$EDITOR` round-trip, and presentation. +//! +//! ## `--match` syntax +//! +//! Each `--match` value is `:` (split on the **first** colon only, +//! because subsystem/entity values themselves contain colons): +//! - `path:` → `{"type":"path","pattern":""}` +//! - `tag:` → `{"type":"tag","value":""}` +//! - `kind:` → `{"type":"kind","value":""}` +//! - `subsystem:` → `{"type":"subsystem","id":""}` +//! - `entity:` → `{"type":"entity","id":""}` +//! +//! e.g. `--match path:src/auth/** --match subsystem:core:subsystem:abcd +//! --match entity:python:function:foo.bar`. The emitted objects are exactly the +//! shape the read path's `rule_match` consumes. + +use std::io::{Read, Write}; +use std::path::Path; + +use anyhow::{Context, Result, anyhow, bail}; +use rusqlite::{Connection, OpenFlags}; +use serde_json::{Value, json}; + +use clarion_federation::filigree::{FiligreeHttpClient, FiligreeLookup}; +use clarion_storage::{ + GuidanceProposal, GuidanceSheet, GuidanceSheetInput, PortableSheet, delete_guidance_sheet, + get_guidance_sheet, guidance_sheet_is_expired, guidance_sheet_is_stale, + guidance_sheet_matches_entity, import_portable_sheet, insert_guidance_sheet, + invalidate_summaries_for_sheet, list_guidance_sheets, upsert_guidance_sheet, +}; + +use crate::cli::GuidanceCommand; + +/// Map a `clarion_storage::StorageError` (which is `Send` but not `Sync`, so it +/// does not satisfy `anyhow`'s `From` bound) into an `anyhow::Error` via its +/// Display — matching the convention in `analyze.rs`. +trait StorageResultExt { + fn into_anyhow(self) -> Result; +} + +impl StorageResultExt for clarion_storage::Result { + fn into_anyhow(self) -> Result { + self.map_err(|e| anyhow!("{e}")) + } +} + +/// The canonical scope-level vocabulary (ADR-024). Ordered project→function so +/// the message lists them in rank order. +const SCOPE_LEVELS: &[&str] = &[ + "project", + "subsystem", + "package", + "module", + "class", + "function", +]; + +const PROVENANCE_MANUAL: &str = "manual"; + +/// Dispatch a `clarion guidance `. +/// +/// # Errors +/// +/// Surfaces parse errors (bad `--match` / `--scope-level`), I/O errors +/// (`$EDITOR`, stdin), and storage errors. Not-found on `show`/`edit`/`delete` +/// is a clean non-panicking error. +pub fn run(command: GuidanceCommand) -> Result<()> { + match command { + GuidanceCommand::Create { + path, + r#match, + scope_level, + content, + name, + pinned, + expires, + } => create( + &path, + CreateArgs { + raw_match: &r#match, + scope_level: &scope_level, + content, + name: name.as_deref(), + pinned, + expires: expires.as_deref(), + }, + ), + GuidanceCommand::Edit { path, id } => edit(&path, &id), + GuidanceCommand::Show { path, id } => show(&path, &id), + GuidanceCommand::List { + path, + for_entity, + expired, + stale, + days, + } => list( + &path, + ListFilters { + for_entity: for_entity.as_deref(), + expired, + stale, + days, + }, + ), + GuidanceCommand::Delete { path, id } => delete(&path, &id), + GuidanceCommand::Promote { + path, + config, + observation_id, + } => promote(&path, config.as_deref(), &observation_id), + GuidanceCommand::Export { path, to } => export(&path, &to), + GuidanceCommand::Import { path, dir } => import(&path, &dir), + } +} + +// ── Match-rule parsing (TDD target #1) ──────────────────────────────────────── + +/// Parse one `--match` value into its `match_rules` JSON object. Splits on the +/// first colon only; the value half is opaque (subsystem/entity ids contain +/// colons). +/// +/// # Errors +/// +/// Errors on a missing colon, an empty value, or an unknown rule type. +fn parse_match_rule(raw: &str) -> Result { + let (rule_type, value) = raw + .split_once(':') + .ok_or_else(|| anyhow!("--match '{raw}': expected ':' (e.g. path:src/**)"))?; + if value.is_empty() { + bail!("--match '{raw}': empty value after '{rule_type}:'"); + } + let rule = match rule_type { + "path" => json!({ "type": "path", "pattern": value }), + "tag" => json!({ "type": "tag", "value": value }), + "kind" => json!({ "type": "kind", "value": value }), + "subsystem" => json!({ "type": "subsystem", "id": value }), + "entity" => json!({ "type": "entity", "id": value }), + other => bail!( + "--match '{raw}': unknown rule type '{other}' \ + (expected one of: path, tag, kind, subsystem, entity)" + ), + }; + Ok(rule) +} + +/// Parse all `--match` values into the `match_rules` array. +fn parse_match_rules(raw: &[String]) -> Result> { + raw.iter().map(|r| parse_match_rule(r)).collect() +} + +fn validate_scope_level(level: &str) -> Result<()> { + if SCOPE_LEVELS.contains(&level) { + Ok(()) + } else { + bail!( + "--scope-level '{level}' is not valid (expected one of: {})", + SCOPE_LEVELS.join(", ") + ) + } +} + +/// Derive a canonical slug for the entity id's third segment from `--name` (or, +/// when absent, the first match rule). The slug must satisfy the canonical-name +/// grammar; we keep alphanumerics, dot, hyphen, underscore and replace any other +/// run with a single hyphen. +fn slugify(input: &str) -> String { + let mut out = String::with_capacity(input.len()); + let mut last_dash = false; + for ch in input.chars() { + if ch.is_ascii_alphanumeric() || matches!(ch, '.' | '-' | '_') { + out.push(ch); + last_dash = false; + } else if !last_dash { + out.push('-'); + last_dash = true; + } + } + let trimmed = out.trim_matches('-').to_owned(); + if trimmed.is_empty() { + // Fall back to a timestamp-ish token so the id is always well-formed. + format!( + "sheet-{}", + std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .map_or(0, |d| d.as_secs()) + ) + } else { + trimmed + } +} + +// ── Subcommand handlers ─────────────────────────────────────────────────────── + +/// Inputs for `create`, grouped so the handler takes one struct instead of a +/// long positional argument list. +struct CreateArgs<'a> { + raw_match: &'a [String], + scope_level: &'a str, + content: Option, + name: Option<&'a str>, + pinned: bool, + expires: Option<&'a str>, +} + +fn create(project_root: &Path, args: CreateArgs<'_>) -> Result<()> { + validate_scope_level(args.scope_level)?; + let match_rules = parse_match_rules(args.raw_match)?; + + // Content: explicit flag, else stdin / $EDITOR. + let content = match args.content { + Some(text) => text, + None => read_content_interactively("")?, + }; + if content.trim().is_empty() { + bail!("guidance content is empty; pass --content or provide text in the editor"); + } + + let slug_source = args + .name + .unwrap_or_else(|| args.raw_match.first().map_or("guidance", String::as_str)); + let slug = slugify(slug_source); + let id = format!("core:guidance:{slug}"); + let short_name = slug.rsplit('.').next().unwrap_or(&slug).to_owned(); + + let conn = open_db(project_root)?; + + // Normalise `--expires` *before* the write so the stored instant is + // byte-format-identical to the read path's `now` (the expiry compare is + // lexical, so a raw date-only or offset string would mis-order). Reject + // unparseable input up front, mirroring `validate_scope_level`. + let expires = args + .expires + .map(|raw| normalize_expires(&conn, raw)) + .transpose()?; + + let now = now_iso8601(&conn)?; + let mut properties = json!({ + "content": content, + "scope_level": args.scope_level, + "match_rules": match_rules, + "pinned": args.pinned, + "provenance": PROVENANCE_MANUAL, + "authored_at": now, + }); + if let Some(expires) = expires + && let Some(obj) = properties.as_object_mut() + { + obj.insert("expires".to_owned(), json!(expires)); + } + + insert_guidance_sheet( + &conn, + &GuidanceSheetInput { + id: &id, + name: &slug, + short_name: &short_name, + properties: &properties, + }, + ) + .into_anyhow() + .context("write guidance sheet")?; + + // ADR-007 churn-eager invalidation: a new sheet adds guidance to the + // entities its match_rules cover, so their cached summaries must be dropped + // or the guidance stays inert until each entity's code changes. Re-fetch the + // just-written sheet (cleaner than hand-rolling a `GuidanceSheet`) and + // invalidate the entities it matches. + // + // Non-atomic: the sheet write is already committed above; an error here can + // leave a committed sheet alongside a stale cache row. Self-healing — a + // re-run, or the next cache-key rotation when the entity's code changes, + // clears it. Over-invalidation is safe; under-invalidation is the only bug. + let invalidated = invalidate_matched_summaries(project_root, &conn, &id)?; + + println!("Created guidance sheet {id}"); + report_invalidation(invalidated); + Ok(()) +} + +/// Invalidate cached summaries for every entity the sheet `id` matches, using +/// the canonicalized project root the storage matcher needs for `path:` rules. +/// Re-fetches the sheet by id so callers don't hand-build a `GuidanceSheet`. +/// A missing sheet (e.g. raced away) is a clean 0. +fn invalidate_matched_summaries(project_root: &Path, conn: &Connection, id: &str) -> Result { + let Some(sheet) = get_guidance_sheet(conn, id).into_anyhow()? else { + return Ok(0); + }; + invalidate_summaries_for_sheet(conn, &sheet, project_root).into_anyhow() +} + +/// Print a short operator note when summaries were invalidated. Silent on 0 so +/// the common no-match case stays quiet. +fn report_invalidation(count: usize) { + if count > 0 { + let plural = if count == 1 { "summary" } else { "summaries" }; + println!("Invalidated {count} cached {plural}"); + } +} + +fn edit(project_root: &Path, id: &str) -> Result<()> { + let conn = open_db(project_root)?; + let sheet = get_guidance_sheet(&conn, id) + .into_anyhow()? + .ok_or_else(|| anyhow!("guidance sheet {id} not found"))?; + + let current = sheet + .properties + .get("content") + .and_then(Value::as_str) + .unwrap_or(""); + let new_content = edit_in_editor(current)?; + if new_content.trim().is_empty() { + bail!("guidance content is empty after edit; aborting (sheet unchanged)"); + } + if new_content == current { + println!("No changes to guidance sheet {id}"); + return Ok(()); + } + + // Read-modify-write: preserve every existing property (authored_at, + // provenance, pinned, expires, scope_level, match_rules, …) and replace + // only `content`. Edit must NOT regenerate authored_at (the staleness + // baseline) or flip provenance. + let mut properties = sheet.properties.clone(); + if let Some(obj) = properties.as_object_mut() { + obj.insert("content".to_owned(), json!(new_content)); + } else { + bail!("guidance sheet {id} has malformed properties; cannot edit"); + } + + upsert_guidance_sheet( + &conn, + &GuidanceSheetInput { + id: &sheet.id, + name: &sheet.name, + short_name: &sheet.short_name, + properties: &properties, + }, + ) + .into_anyhow() + .context("write edited guidance sheet")?; + + // ADR-007 churn-eager invalidation: the edit changed `content`, so the + // composed guidance for every matched entity changed and their cached + // summaries are stale. Invalidate the union of entities matched before and + // after the edit. `edit` only mutates `content` (match_rules are preserved), + // so before == after today — but compute the union defensively so a future + // rule-editing path stays correct without a second visit here. The earlier + // `sheet` snapshot carries the pre-edit rules; `id` re-fetches the post-edit + // sheet. + // + // Non-atomic: the edited sheet is already committed above; an error here can + // leave it alongside a stale cache row. Self-healing on re-run / next + // cache-key rotation (same posture as `create`). + let invalidated = invalidate_matched_summaries_union(project_root, &conn, &sheet, id)?; + + println!("Updated guidance sheet {id}"); + report_invalidation(invalidated); + Ok(()) +} + +/// Invalidate the union of entities matched by `before` (a pre-edit snapshot) +/// and by the sheet currently stored under `id`. The returned count is the true +/// number of rows removed across the union, with no double-count: pass 1 +/// (`before`) deletes its matched rows, which removes those entities from pass 2's +/// driving `SELECT DISTINCT entity_id FROM summary_cache`, so pass 2 never +/// re-tests an already-cleared entity — only after-only entities remain for it +/// to delete. +fn invalidate_matched_summaries_union( + project_root: &Path, + conn: &Connection, + before: &GuidanceSheet, + id: &str, +) -> Result { + let mut removed = invalidate_summaries_for_sheet(conn, before, project_root).into_anyhow()?; + removed += invalidate_matched_summaries(project_root, conn, id)?; + Ok(removed) +} + +fn show(project_root: &Path, id: &str) -> Result<()> { + let conn = open_db(project_root)?; + let sheet = get_guidance_sheet(&conn, id) + .into_anyhow()? + .ok_or_else(|| anyhow!("guidance sheet {id} not found"))?; + print!("{}", render_sheet(&sheet)); + Ok(()) +} + +/// Filters for `clarion guidance list`. All active filters compose by +/// **intersection** (AND): a sheet is shown only if it passes every one. The +/// two date filters (`expired`, `stale`) are independent — combining them shows +/// sheets that are expired AND stale, which is the intuitive "show me the worst +/// of the worst" operator-triage reading and falls out of the simplest code. +#[derive(Clone, Copy)] +struct ListFilters<'a> { + for_entity: Option<&'a str>, + expired: bool, + stale: bool, + /// Staleness window in days (used only when `stale` is set). + days: u32, +} + +fn list(project_root: &Path, filters: ListFilters<'_>) -> Result<()> { + let conn = open_db(project_root)?; + let sheets = list_guidance_sheets(&conn).into_anyhow()?; + + let canonical_root = project_root + .canonicalize() + .unwrap_or_else(|_| project_root.to_path_buf()); + + // Compute the comparison instants once, from the connection's own clock, in + // the exact `YYYY-MM-DDTHH:MM:SS.mmmZ` shape stored timestamps use — so the + // storage predicates' lexical compares are valid instant compares. `now` + // drives `--expired`; `stale_before` (now − N days) drives `--stale`. NB: + // `--stale` here is the *age/review-cadence* signal (system-design §7.741), + // distinct from the churn-based `CLA-FACT-GUIDANCE-CHURN-STALE` finding. + let now = now_iso8601(&conn)?; + let stale_before = if filters.stale { + Some(now_minus_days(&conn, filters.days)?) + } else { + None + }; + + let mut shown = 0usize; + for sheet in &sheets { + if let Some(entity_id) = filters.for_entity + && !guidance_sheet_matches_entity(&conn, sheet, entity_id, &canonical_root) + .into_anyhow()? + { + continue; + } + if filters.expired && !guidance_sheet_is_expired(sheet, &now) { + continue; + } + if let Some(cutoff) = stale_before.as_deref() + && !guidance_sheet_is_stale(sheet, cutoff) + { + continue; + } + println!("{}", render_sheet_line(sheet)); + shown += 1; + } + if shown == 0 { + println!("{}", empty_list_message(&filters)); + } + Ok(()) +} + +/// The "nothing matched" line, naming the active filters so an operator knows +/// why the list is empty. +fn empty_list_message(filters: &ListFilters<'_>) -> String { + let mut qualifiers: Vec = Vec::new(); + if let Some(entity_id) = filters.for_entity { + qualifiers.push(format!("match {entity_id}")); + } + if filters.expired { + qualifiers.push("are expired".to_owned()); + } + if filters.stale { + qualifiers.push(format!("are stale (> {} days)", filters.days)); + } + if qualifiers.is_empty() { + "(no guidance sheets)".to_owned() + } else { + format!("(no guidance sheets {})", qualifiers.join(" and ")) + } +} + +/// Mint the `now − days` staleness cutoff via the connection's own clock, in the +/// same fixed-width ISO-8601 shape as [`now_iso8601`] so the storage staleness +/// predicate's lexical compare is a valid instant compare. `days` is a `u32`, so +/// the inline modifier string can never carry injection. +fn now_minus_days(conn: &Connection, days: u32) -> Result { + let ts: String = conn + .query_row( + "SELECT strftime('%Y-%m-%dT%H:%M:%fZ','now',?1)", + [format!("-{days} days")], + |row| row.get(0), + ) + .context("mint staleness cutoff timestamp")?; + Ok(ts) +} + +fn delete(project_root: &Path, id: &str) -> Result<()> { + let conn = open_db(project_root)?; + + // Snapshot the sheet (and thus its match_rules) BEFORE deletion so we can + // still compute which entities it covered. Not-found is a clean error. + let sheet = get_guidance_sheet(&conn, id) + .into_anyhow()? + .ok_or_else(|| anyhow!("guidance sheet {id} not found"))?; + + // ADR-007 churn-eager invalidation: removing the sheet removes guidance from + // the entities it covered, so their cached summaries are stale and must be + // dropped (the next query re-summarizes without the now-deleted guidance). + // + // ORDER MATTERS: invalidate BEFORE deleting the sheet row. The sheet's + // `guides` edges are `from_id REFERENCES entities(id) ON DELETE CASCADE`, and + // `open_db` enables `foreign_keys`, so deleting the sheet first would CASCADE + // those edges away before `invalidate_summaries_for_sheet` reads them — and a + // guides-only sheet would invalidate nothing. Invalidating first is safe: + // rule/edge matching is unaffected by the sheet's presence (it never touches + // the matched entities' own rows), and over-invalidation is harmless. + let invalidated = invalidate_summaries_for_sheet(&conn, &sheet, project_root).into_anyhow()?; + + if !delete_guidance_sheet(&conn, id).into_anyhow()? { + bail!("guidance sheet {id} not found") + } + + println!("Deleted guidance sheet {id}"); + report_invalidation(invalidated); + Ok(()) +} + +fn promote(project_root: &Path, config_path: Option<&Path>, observation_id: &str) -> Result<()> { + let canonical_root = project_root + .canonicalize() + .unwrap_or_else(|_| project_root.to_path_buf()); + let mcp_config = crate::analyze::load_mcp_config(&canonical_root, config_path); + let client = FiligreeHttpClient::from_config_with_project_root( + &mcp_config.integrations.filigree, + |name| std::env::var(name).ok(), + Some(&canonical_root), + ) + .context("build Filigree client")? + .ok_or_else(|| anyhow!("Filigree integration is disabled in clarion.yaml"))?; + + let observation = client + .observation_by_id(observation_id) + .with_context(|| format!("read Filigree observation {observation_id}"))? + .ok_or_else(|| anyhow!("Filigree observation {observation_id} not found"))?; + let proposal = GuidanceProposal::from_observation_detail(&observation.detail) + .map_err(|e| anyhow!("{e}")) + .with_context(|| { + format!("Filigree observation {observation_id} is not a Clarion guidance proposal") + })?; + + let conn = open_db(&canonical_root)?; + let now = now_iso8601(&conn)?; + let promoted = proposal + .to_promoted_sheet(&now) + .map_err(|e| anyhow!("{e}")) + .context("build promoted guidance sheet")?; + + let before = get_guidance_sheet(&conn, &promoted.id).into_anyhow()?; + upsert_guidance_sheet( + &conn, + &GuidanceSheetInput { + id: &promoted.id, + name: &promoted.name, + short_name: &promoted.short_name, + properties: &promoted.properties, + }, + ) + .into_anyhow() + .with_context(|| format!("write promoted guidance sheet {}", promoted.id))?; + + let invalidated = match before { + Some(before) => { + invalidate_matched_summaries_union(&canonical_root, &conn, &before, &promoted.id)? + } + None => invalidate_matched_summaries(&canonical_root, &conn, &promoted.id)?, + }; + + let dismissed = client + .dismiss_observation(observation_id, "promoted to Clarion guidance sheet") + .is_ok(); + println!("Promoted observation {observation_id} to {}", promoted.id); + report_invalidation(invalidated); + if !dismissed { + println!( + "Filigree observation {observation_id} was promoted locally but could not be dismissed" + ); + } + Ok(()) +} + +// ── Export / import (TDD target: round-trip) ────────────────────────────────── + +/// Export every guidance sheet to `to_dir`, one deterministic JSON file per +/// sheet. The output is engineered to be committed to a shared git repo: +/// byte-identical across runs (sorted keys, no embedded export timestamp/path) +/// and diff-friendly (one file per sheet, one field per line). Sheets are +/// iterated in stable id order so any incidental logging is run-stable; the +/// per-file bytes — the thing that gets committed — carry no ordering. +fn export(project_root: &Path, to_dir: &Path) -> Result<()> { + let conn = open_db(project_root)?; + let mut sheets = list_guidance_sheets(&conn).into_anyhow()?; + // `list_guidance_sheets` orders by scope_rank/authored_at/id (the read-path + // composition sort). Re-sort by id alone for a stable, content-independent + // export order — id is unique, so this is a total order with no tie-break on + // a mutable field. + sheets.sort_by(|a, b| a.id.cmp(&b.id)); + + std::fs::create_dir_all(to_dir) + .with_context(|| format!("create export directory {}", to_dir.display()))?; + + for sheet in &sheets { + let portable = PortableSheet::from_sheet(sheet); + let json = portable.to_canonical_json().into_anyhow()?; + let file = to_dir.join(portable.file_name()); + std::fs::write(&file, json.as_bytes()) + .with_context(|| format!("write {}", file.display()))?; + } + + println!( + "Exported {} guidance sheet(s) to {}", + sheets.len(), + to_dir.display() + ); + Ok(()) +} + +/// Import guidance sheets from `from_dir`. Reads every `*.json` file, parses each +/// into a [`PortableSheet`], and upserts it (additive — existing local sheets not +/// in the directory are untouched; this is a merge, never a destructive mirror). +/// Ids are preserved exactly. A malformed `*.json` aborts the whole import with +/// an error naming the file — a silently-dropped sheet is data loss, so we fail +/// loud rather than skip. Non-`.json` files (a README, a `.gitignore`) are +/// ignored: the sheet contract is `*.json`, so filtering to it is not "skipping a +/// sheet". Re-importing the same directory is idempotent on content. +fn import(project_root: &Path, from_dir: &Path) -> Result<()> { + let conn = open_db(project_root)?; + + // Collect + sort the file list so import order (and thus any per-sheet log + // line / cache-invalidation sequencing) is deterministic across runs. + let mut files: Vec = Vec::new(); + let entries = std::fs::read_dir(from_dir) + .with_context(|| format!("read import directory {}", from_dir.display()))?; + for entry in entries { + let entry = entry.with_context(|| format!("read entry in {}", from_dir.display()))?; + let path = entry.path(); + if path.extension().and_then(|e| e.to_str()) == Some("json") { + files.push(path); + } + } + files.sort(); + + let mut imported = 0usize; + let mut invalidated = 0usize; + for file in &files { + let bytes = + std::fs::read_to_string(file).with_context(|| format!("read {}", file.display()))?; + let portable = PortableSheet::from_canonical_json(&file.display().to_string(), &bytes) + .into_anyhow()?; + + // Snapshot the pre-import sheet (if any) BEFORE the upsert, so that when an + // import UPDATES an existing sheet whose `match_rules` changed, we can + // invalidate the OLD matches too — not just the post-import matches. A + // fresh import (no prior sheet) has no old set. + let before = get_guidance_sheet(&conn, &portable.id).into_anyhow()?; + + import_portable_sheet(&conn, &portable) + .into_anyhow() + .with_context(|| format!("import {}", file.display()))?; + // ADR-007 churn-eager invalidation: an imported sheet adds/changes + // guidance for the entities it covers, so their cached summaries must be + // dropped — same union-of-before+after posture as `edit`. + invalidated += match before { + Some(before) => { + invalidate_matched_summaries_union(project_root, &conn, &before, &portable.id)? + } + None => invalidate_matched_summaries(project_root, &conn, &portable.id)?, + }; + imported += 1; + } + + println!( + "Imported {imported} guidance sheet(s) from {}", + from_dir.display() + ); + report_invalidation(invalidated); + Ok(()) +} + +// ── Presentation ────────────────────────────────────────────────────────────── + +fn render_sheet_line(sheet: &GuidanceSheet) -> String { + let level = sheet.scope_level.as_deref().unwrap_or("?"); + let pinned = sheet + .properties + .get("pinned") + .and_then(Value::as_bool) + .unwrap_or(false); + let pin = if pinned { " [pinned]" } else { "" }; + let rules = sheet + .properties + .get("match_rules") + .and_then(Value::as_array) + .map_or(0, Vec::len); + format!("{} ({level}, {rules} rule(s)){pin}", sheet.id) +} + +fn render_sheet(sheet: &GuidanceSheet) -> String { + use std::fmt::Write as _; + let mut out = String::new(); + let _ = writeln!(out, "id: {}", sheet.id); + let _ = writeln!( + out, + "scope_level: {}", + sheet.scope_level.as_deref().unwrap_or("?") + ); + let field = |key: &str| -> Option { + sheet.properties.get(key).and_then(|v| match v { + Value::String(s) => Some(s.clone()), + Value::Bool(b) => Some(b.to_string()), + _ => None, + }) + }; + if let Some(p) = field("provenance") { + let _ = writeln!(out, "provenance: {p}"); + } + if let Some(p) = field("pinned") { + let _ = writeln!(out, "pinned: {p}"); + } + if let Some(a) = field("authored_at") { + let _ = writeln!(out, "authored_at: {a}"); + } + if let Some(e) = field("expires") { + let _ = writeln!(out, "expires: {e}"); + } + if let Some(rules) = sheet + .properties + .get("match_rules") + .and_then(Value::as_array) + { + out.push_str("match_rules:\n"); + for rule in rules { + let _ = writeln!(out, " - {rule}"); + } + } + out.push_str("content:\n"); + let content = sheet + .properties + .get("content") + .and_then(Value::as_str) + .unwrap_or(""); + for line in content.lines() { + let _ = writeln!(out, " {line}"); + } + out +} + +// ── I/O helpers ─────────────────────────────────────────────────────────────── + +/// Open a read-write connection to `.clarion/clarion.db` with a generous busy +/// timeout so a concurrently-running `serve` writer does not cause an immediate +/// lock error. +fn open_db(project_root: &Path) -> Result { + let db_path = project_root.join(".clarion").join("clarion.db"); + if !db_path.exists() { + bail!( + "Clarion database not found at {}; run `clarion analyze` first", + db_path.display() + ); + } + let conn = Connection::open_with_flags( + &db_path, + OpenFlags::SQLITE_OPEN_READ_WRITE | OpenFlags::SQLITE_OPEN_URI, + ) + .with_context(|| format!("open database {}", db_path.display()))?; + conn.busy_timeout(std::time::Duration::from_secs(5)) + .context("set busy_timeout")?; + conn.pragma_update(None, "foreign_keys", "ON") + .context("enable foreign_keys")?; + Ok(conn) +} + +/// Read guidance content from stdin if it is piped, otherwise launch `$EDITOR`. +fn read_content_interactively(seed: &str) -> Result { + use std::io::IsTerminal; + if !std::io::stdin().is_terminal() { + let mut buf = String::new(); + std::io::stdin() + .read_to_string(&mut buf) + .context("read guidance content from stdin")?; + return Ok(buf); + } + edit_in_editor(seed) +} + +/// Launch `$EDITOR` (or `$VISUAL`) on a temp file seeded with `seed` and return +/// the saved contents. +fn edit_in_editor(seed: &str) -> Result { + let editor = std::env::var("VISUAL") + .or_else(|_| std::env::var("EDITOR")) + .map_err(|_| anyhow!("neither $VISUAL nor $EDITOR is set; set one or pass --content"))?; + + let dir = std::env::temp_dir(); + let file = dir.join(format!("clarion-guidance-{}.md", std::process::id())); + { + let mut f = std::fs::File::create(&file) + .with_context(|| format!("create temp edit file {}", file.display()))?; + f.write_all(seed.as_bytes()) + .context("seed temp edit file")?; + } + + let status = run_editor(&editor, &file)?; + let result = if status { + std::fs::read_to_string(&file).context("read back edited content") + } else { + Err(anyhow!("editor '{editor}' exited with a non-zero status")) + }; + let _ = std::fs::remove_file(&file); + result +} + +/// Run the editor command (which may carry arguments, e.g. `code --wait`) on +/// `file`. Returns whether it exited successfully. +fn run_editor(editor: &str, file: &Path) -> Result { + let mut parts = editor.split_whitespace(); + let program = parts + .next() + .ok_or_else(|| anyhow!("$EDITOR/$VISUAL is empty"))?; + let args: Vec<&str> = parts.collect(); + let status = std::process::Command::new(program) + .args(&args) + .arg(file) + .status() + .with_context(|| format!("launch editor '{editor}'"))?; + Ok(status.success()) +} + +/// Mint the sheet's `authored_at` using the open connection's own clock, in the +/// exact `strftime('%Y-%m-%dT%H:%M:%fZ','now')` shape the storage layer stamps +/// `created_at` / `updated_at` with — so `authored_at` sorts lexically +/// alongside stored timestamps with zero formatting drift. It is a distinct +/// property: `created_at`/`updated_at` move on every write, `authored_at` is set +/// once and preserved across `edit` (the staleness baseline T5 reads). +fn now_iso8601(conn: &Connection) -> Result { + let ts: String = conn + .query_row("SELECT strftime('%Y-%m-%dT%H:%M:%fZ','now')", [], |row| { + row.get(0) + }) + .context("mint authored_at timestamp")?; + Ok(ts) +} + +/// Normalise an `--expires` value to a UTC instant in the exact +/// `YYYY-MM-DDTHH:MM:SS.mmmZ` shape the read path compares against. The expiry +/// check (`crates/clarion-mcp/src/catalogue/inspection.rs`) is a *lexical* +/// `expires < now` compare, so the stored string must be byte-format-identical +/// to `now`: same UTC zone (`Z`), same 3-digit subsecond, same length. We run +/// the input through the connection's own `strftime`, which: +/// - accepts a full instant (`2026-12-31T23:59:59.999Z`), a date+time, an +/// offset form (`…+02:00`, converted to UTC), or a bare date; +/// - normalises a **date-only** value to **start-of-day UTC** +/// (`2026-06-03` → `2026-06-03T00:00:00.000Z`); and +/// - returns `NULL` for anything it cannot parse, which we reject. +/// +/// # Errors +/// +/// Returns an error if `raw` is not a parseable date/time. +fn normalize_expires(conn: &Connection, raw: &str) -> Result { + let normalized: Option = conn + .query_row("SELECT strftime('%Y-%m-%dT%H:%M:%fZ', ?1)", [raw], |row| { + row.get(0) + }) + .context("normalize --expires timestamp")?; + normalized.ok_or_else(|| { + anyhow!( + "--expires '{raw}' is not a valid date/time; use an ISO-8601 instant \ + (e.g. 2026-12-31T23:59:59Z) or a date (e.g. 2026-12-31, taken as \ + start-of-day UTC)" + ) + }) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn parse_path_rule() { + assert_eq!( + parse_match_rule("path:src/auth/**").unwrap(), + json!({"type": "path", "pattern": "src/auth/**"}) + ); + } + + #[test] + fn parse_tag_rule() { + assert_eq!( + parse_match_rule("tag:auth").unwrap(), + json!({"type": "tag", "value": "auth"}) + ); + } + + #[test] + fn parse_kind_rule() { + assert_eq!( + parse_match_rule("kind:function").unwrap(), + json!({"type": "kind", "value": "function"}) + ); + } + + #[test] + fn parse_subsystem_rule_keeps_colons_in_value() { + // The value half is opaque and itself contains colons — split once only. + assert_eq!( + parse_match_rule("subsystem:core:subsystem:abcd").unwrap(), + json!({"type": "subsystem", "id": "core:subsystem:abcd"}) + ); + } + + #[test] + fn parse_entity_rule_keeps_colons_in_value() { + assert_eq!( + parse_match_rule("entity:python:function:foo.bar").unwrap(), + json!({"type": "entity", "id": "python:function:foo.bar"}) + ); + } + + #[test] + fn parse_path_glob_with_no_extra_colons() { + assert_eq!( + parse_match_rule("path:**/refresh.py").unwrap(), + json!({"type": "path", "pattern": "**/refresh.py"}) + ); + } + + #[test] + fn parse_rejects_missing_colon() { + let err = parse_match_rule("pathsrc").unwrap_err().to_string(); + assert!(err.contains("expected ':'"), "{err}"); + } + + #[test] + fn parse_rejects_empty_value() { + let err = parse_match_rule("tag:").unwrap_err().to_string(); + assert!(err.contains("empty value"), "{err}"); + } + + #[test] + fn parse_rejects_unknown_type() { + let err = parse_match_rule("colour:blue").unwrap_err().to_string(); + assert!(err.contains("unknown rule type 'colour'"), "{err}"); + } + + #[test] + fn parse_many_collects_all() { + let raw = vec![ + "path:src/**".to_owned(), + "tag:auth".to_owned(), + "entity:python:function:x.y".to_owned(), + ]; + let rules = parse_match_rules(&raw).unwrap(); + assert_eq!(rules.len(), 3); + assert_eq!( + rules[2], + json!({"type": "entity", "id": "python:function:x.y"}) + ); + } + + #[test] + fn parse_many_propagates_first_error() { + let raw = vec!["path:ok".to_owned(), "bad".to_owned()]; + assert!(parse_match_rules(&raw).is_err()); + } + + #[test] + fn scope_level_validation() { + assert!(validate_scope_level("module").is_ok()); + assert!(validate_scope_level("project").is_ok()); + assert!(validate_scope_level("subsystem").is_ok()); + assert!(validate_scope_level("nonsense").is_err()); + } + + #[test] + fn slugify_cleans_unsafe_chars() { + assert_eq!(slugify("auth tokens"), "auth-tokens"); + assert_eq!(slugify("pkg.mod.fn"), "pkg.mod.fn"); + assert_eq!(slugify("path:src/**"), "path-src"); + assert_eq!(slugify("a__b-c.d"), "a__b-c.d"); + } + + #[test] + fn now_iso8601_is_well_formed() { + let conn = Connection::open_in_memory().unwrap(); + let ts = now_iso8601(&conn).unwrap(); + // YYYY-MM-DDTHH:MM:SS.mmmZ — 24 chars, sorts lexically. + assert_eq!(ts.len(), 24, "{ts}"); + assert!(ts.ends_with('Z')); + assert_eq!(&ts[4..5], "-"); + assert_eq!(&ts[10..11], "T"); + } + + #[test] + fn normalize_expires_produces_now_compatible_format() { + let conn = Connection::open_in_memory().unwrap(); + // A full instant round-trips byte-identically. + assert_eq!( + normalize_expires(&conn, "2026-12-31T23:59:59.999Z").unwrap(), + "2026-12-31T23:59:59.999Z" + ); + // A bare date normalizes to start-of-day UTC, NOT a bare prefix that + // would sort below same-day instants and expire immediately. + assert_eq!( + normalize_expires(&conn, "2026-12-31").unwrap(), + "2026-12-31T00:00:00.000Z" + ); + // An offset form is converted to UTC `Z`. + assert_eq!( + normalize_expires(&conn, "2026-06-03T12:00:00+02:00").unwrap(), + "2026-06-03T10:00:00.000Z" + ); + // Every normalized value matches the `now` shape (24 chars, ends in Z). + for raw in ["2026-12-31", "2026-12-31T23:59:59Z", "2026-06-03 12:00:00"] { + let out = normalize_expires(&conn, raw).unwrap(); + assert_eq!(out.len(), 24, "{raw} -> {out}"); + assert!(out.ends_with('Z'), "{raw} -> {out}"); + } + } + + #[test] + fn normalize_expires_rejects_garbage() { + let conn = Connection::open_in_memory().unwrap(); + assert!(normalize_expires(&conn, "tomorrow").is_err()); + assert!(normalize_expires(&conn, "not-a-date").is_err()); + assert!(normalize_expires(&conn, "").is_err()); + } + + #[test] + fn normalize_expires_future_is_not_lexically_expired() { + // Proxy the read path's `expires < now` lexical compare: a future + // normalized expiry must sort *after* the current instant, so the read + // path will NOT treat the sheet as expired. + let conn = Connection::open_in_memory().unwrap(); + let now = now_iso8601(&conn).unwrap(); + let future = normalize_expires(&conn, "2999-01-01T00:00:00Z").unwrap(); + assert!( + future > now, + "future expiry {future} must sort after now {now}" + ); + } +} diff --git a/crates/clarion-cli/src/http_read.rs b/crates/clarion-cli/src/http_read.rs index 6698408b..55db09c1 100644 --- a/crates/clarion-cli/src/http_read.rs +++ b/crates/clarion-cli/src/http_read.rs @@ -13,7 +13,7 @@ use axum::response::{IntoResponse, Response}; use axum::routing::{get, post}; use axum::{Json, Router}; use clarion_core::HttpErrorCode as ErrorCode; -use clarion_mcp::config::HttpReadConfig; +use clarion_federation::config::HttpReadConfig; use clarion_storage::ReaderPool; use serde::Serialize; use tokio::sync::oneshot; @@ -145,8 +145,9 @@ pub(crate) struct AppState { /// is always unauthenticated so siblings can probe pre-auth. pub(crate) auth_token: Option>, /// Resolved Loom component identity HMAC secret. When present, protected - /// routes require `X-Loom-Component: clarion:`. + /// routes require `X-Loom-Component: clarion:` plus freshness headers. pub(crate) identity_secret: Option>, + pub(crate) hmac_replay_cache: auth::SharedHmacReplayCache, /// Present only when `serve.http.wardline_taint_write` is true (ADR-036). /// `None` ⇒ the write API is disabled and returns 403 `WRITE_DISABLED`. pub(crate) taint_writer: Option>, @@ -388,6 +389,7 @@ fn run_http_read_server( instance_id, auth_token, identity_secret, + hmac_replay_cache: auth::new_hmac_replay_cache(), taint_writer, }; let serve_future = axum::serve(listener, router(state)) @@ -662,11 +664,21 @@ fn json_error(status: StatusCode, code: ErrorCode, message: &str) -> Response { #[cfg(test)] mod tests { - use std::sync::mpsc; + use std::sync::{Mutex, MutexGuard, mpsc}; use super::*; use axum::http::{HeaderMap, HeaderValue}; + static HTTP_RUNTIME_TEST_LOCK: Mutex<()> = Mutex::new(()); + + fn http_runtime_test_guard() -> MutexGuard<'static, ()> { + let guard = HTTP_RUNTIME_TEST_LOCK + .lock() + .unwrap_or_else(std::sync::PoisonError::into_inner); + HTTP_THREAD_PANIC_TRIGGER.store(false, std::sync::atomic::Ordering::SeqCst); + guard + } + // REQ-F-02 (ADR-038 §4): `resolve(locator)` must reject an SEI-shaped input // by the RESERVED PREFIX, not a colon count — an SEI carries the same two // colons a `{plugin}:{kind}:{qualname}` locator does. @@ -727,11 +739,10 @@ mod tests { /// "without authentication". #[test] fn spawn_emits_loopback_no_token_trust_warning() { - use clarion_mcp::config::HttpReadConfig; + use clarion_federation::config::HttpReadConfig; use clarion_storage::ReaderPool; use std::io; use std::net::{SocketAddr, TcpListener}; - use std::sync::Mutex; use tracing_subscriber::fmt::MakeWriter; #[derive(Clone)] @@ -759,6 +770,8 @@ mod tests { } } + let _guard = http_runtime_test_guard(); + let buffer = Arc::new(Mutex::new(Vec::::new())); let writer = CaptureWriter { buffer: buffer.clone(), @@ -837,10 +850,12 @@ mod tests { /// spawn→drop→join sequence end to end. #[test] fn spawn_with_taint_writer_shuts_down_cleanly() { - use clarion_mcp::config::HttpReadConfig; + use clarion_federation::config::HttpReadConfig; use clarion_storage::ReaderPool; use std::net::{SocketAddr, TcpListener}; + let _guard = http_runtime_test_guard(); + let probe = TcpListener::bind(("127.0.0.1", 0)).expect("probe bind"); let bind: SocketAddr = probe.local_addr().expect("probe local addr"); drop(probe); @@ -888,10 +903,12 @@ mod tests { /// absorb the panic (i.e. anything outside per-request middleware). #[test] fn check_running_surfaces_supervisor_signal_after_runtime_panic() { - use clarion_mcp::config::HttpReadConfig; + use clarion_federation::config::HttpReadConfig; use clarion_storage::ReaderPool; use std::net::{SocketAddr, TcpListener}; + let _guard = http_runtime_test_guard(); + // Hold-and-drop: bind to ephemeral 0 to discover a free port, then // drop so the HTTP server can re-bind it. The micro-race is fine // here — if the port is stolen we surface a different error. @@ -915,9 +932,6 @@ mod tests { crate::instance::parse_instance_id_for_test("00000000-0000-4000-8000-000000000001") .expect("parse synthetic instance id"); - // Defensive: clear any stale trigger from a prior test. - HTTP_THREAD_PANIC_TRIGGER.store(false, std::sync::atomic::Ordering::SeqCst); - let mut server = spawn( tempdir.path().to_path_buf(), db_path.clone(), diff --git a/crates/clarion-cli/src/http_read/auth.rs b/crates/clarion-cli/src/http_read/auth.rs index 688d0c0b..a3ce0f49 100644 --- a/crates/clarion-cli/src/http_read/auth.rs +++ b/crates/clarion-cli/src/http_read/auth.rs @@ -2,12 +2,18 @@ //! //! Split out of `http_read.rs` (mechanical relocation; behaviour unchanged). +use std::collections::HashMap; +use std::sync::{Arc, Mutex}; + use axum::body::{Body, to_bytes}; use axum::extract::State; use axum::http::{Request, StatusCode}; use axum::response::Response; use clarion_core::HttpErrorCode as ErrorCode; +use hmac::{Hmac, Mac}; use sha2::{Digest, Sha256}; +use subtle::ConstantTimeEq; +use time::OffsetDateTime; use tower::BoxError; use tower::load_shed; use tower::timeout; @@ -15,6 +21,50 @@ use tower::timeout; use super::errors::format_dyn_error_chain; use super::{AppState, HTTP_BODY_LIMIT_BYTES, WARDLINE_BODY_LIMIT_BYTES, json_error}; +type HmacSha256 = Hmac; +pub(crate) type SharedHmacReplayCache = Arc>; + +/// Wire-pinned HMAC freshness window. +/// +/// Basis: local sibling HTTP calls should complete in milliseconds; five +/// minutes tolerates moderate clock skew without making a captured request +/// useful for long. Override: none, this is part of the federation auth wire +/// contract. Retune: successor ADR if sibling deployments demonstrate a wider +/// skew requirement. +const HMAC_FRESHNESS_WINDOW_SECONDS: i64 = 300; +const HMAC_NONCE_MAX_LEN: usize = 128; + +#[derive(Debug, Default)] +pub(crate) struct HmacReplayCache { + seen: HashMap, +} + +pub(crate) fn new_hmac_replay_cache() -> SharedHmacReplayCache { + Arc::new(Mutex::new(HmacReplayCache::default())) +} + +impl HmacReplayCache { + fn check_and_record( + &mut self, + nonce: &str, + request_timestamp: i64, + now_timestamp: i64, + ) -> bool { + let oldest_allowed = now_timestamp.saturating_sub(HMAC_FRESHNESS_WINDOW_SECONDS); + self.seen.retain(|_, seen_at| *seen_at >= oldest_allowed); + if request_timestamp < oldest_allowed + || request_timestamp > now_timestamp.saturating_add(HMAC_FRESHNESS_WINDOW_SECONDS) + { + return false; + } + if self.seen.contains_key(nonce) { + return false; + } + self.seen.insert(nonce.to_owned(), request_timestamp); + true + } +} + /// Enforce configured identity on protected routes. Prefer the Loom HMAC /// identity when `identity_token_env` is configured; otherwise preserve the /// legacy bearer-token path for existing deployments. @@ -45,7 +95,8 @@ pub(crate) async fn require_http_identity_with_limit( next: axum::middleware::Next, ) -> Response { if let Some(secret) = state.identity_secret.as_ref() { - return require_hmac_identity(secret, body_limit, request, next).await; + return require_hmac_identity(secret, &state.hmac_replay_cache, body_limit, request, next) + .await; } let Some(expected) = state.auth_token.as_ref() else { return next.run(request).await; @@ -70,6 +121,7 @@ pub(crate) async fn require_http_identity_with_limit( pub(crate) async fn require_hmac_identity( secret: &str, + replay_cache: &SharedHmacReplayCache, body_limit: usize, request: Request, next: axum::middleware::Next, @@ -90,6 +142,24 @@ pub(crate) async fn require_hmac_identity( let Some(presented) = presented else { return unauthenticated_response(); }; + let timestamp = parts + .headers + .get("x-loom-timestamp") + .and_then(|value| value.to_str().ok()) + .and_then(|value| value.trim().parse::().ok()); + let Some(timestamp) = timestamp else { + return unauthenticated_response(); + }; + let nonce = parts + .headers + .get("x-loom-nonce") + .and_then(|value| value.to_str().ok()) + .map(str::trim) + .filter(|nonce| !nonce.is_empty() && nonce.len() <= HMAC_NONCE_MAX_LEN) + .map(str::to_owned); + let Some(nonce) = nonce else { + return unauthenticated_response(); + }; let Ok(body_bytes) = to_bytes(body, body_limit).await else { // CI-02 fix: a body read failure here is not a path-validation // problem. The outer `RequestBodyLimitLayer` already rejects @@ -103,10 +173,24 @@ pub(crate) async fn require_hmac_identity( "request body could not be read", ); }; - let expected = component_hmac_hex(secret.as_bytes(), &method, &path_and_query, &body_bytes); + let expected = component_hmac_hex( + secret.as_bytes(), + &method, + &path_and_query, + &body_bytes, + timestamp, + &nonce, + ); if !constant_time_eq(presented.as_bytes(), expected.as_bytes()) { return unauthenticated_response(); } + let now = OffsetDateTime::now_utc().unix_timestamp(); + let fresh_and_unseen = replay_cache + .lock() + .is_ok_and(|mut cache| cache.check_and_record(&nonce, timestamp, now)); + if !fresh_and_unseen { + return unauthenticated_response(); + } next.run(Request::from_parts(parts, Body::from(body_bytes))) .await } @@ -124,44 +208,36 @@ pub(crate) fn component_hmac_hex( method: &str, path_and_query: &str, body: &[u8], + timestamp: i64, + nonce: &str, ) -> String { hmac_sha256_hex( secret, - canonical_hmac_message(method, path_and_query, body).as_bytes(), + canonical_hmac_message(method, path_and_query, body, timestamp, nonce).as_bytes(), ) } -pub(crate) fn canonical_hmac_message(method: &str, path_and_query: &str, body: &[u8]) -> String { +pub(crate) fn canonical_hmac_message( + method: &str, + path_and_query: &str, + body: &[u8], + timestamp: i64, + nonce: &str, +) -> String { format!( - "{}\n{}\n{}", + "{}\n{}\n{}\n{}\n{}", method, path_and_query, - hex_lower(&Sha256::digest(body)) + hex_lower(&Sha256::digest(body)), + timestamp, + nonce ) } pub(crate) fn hmac_sha256_hex(secret: &[u8], message: &[u8]) -> String { - const BLOCK_SIZE: usize = 64; - let mut key = [0_u8; BLOCK_SIZE]; - if secret.len() > BLOCK_SIZE { - key[..32].copy_from_slice(&Sha256::digest(secret)); - } else { - key[..secret.len()].copy_from_slice(secret); - } - let mut ipad = [0x36_u8; BLOCK_SIZE]; - let mut opad = [0x5c_u8; BLOCK_SIZE]; - for index in 0..BLOCK_SIZE { - ipad[index] ^= key[index]; - opad[index] ^= key[index]; - } - let mut inner = Sha256::new(); - inner.update(ipad); - inner.update(message); - let inner = inner.finalize(); - let mut outer = Sha256::new(); - outer.update(opad); - outer.update(inner); - hex_lower(&outer.finalize()) + let mut mac = HmacSha256::new_from_slice(secret).expect("HMAC accepts keys of any size"); + mac.update(message); + hex_lower(&mac.finalize().into_bytes()) } pub(crate) fn hex_lower(bytes: &[u8]) -> String { @@ -175,14 +251,7 @@ pub(crate) fn hex_lower(bytes: &[u8]) -> String { } pub(crate) fn constant_time_eq(a: &[u8], b: &[u8]) -> bool { - if a.len() != b.len() { - return false; - } - let mut diff = 0_u8; - for (left, right) in a.iter().zip(b.iter()) { - diff |= left ^ right; - } - diff == 0 + a.len() == b.len() && bool::from(a.ct_eq(b)) } pub(crate) async fn handle_middleware_error(err: BoxError) -> Response { @@ -225,6 +294,43 @@ mod tests { use super::*; + #[test] + fn hmac_sha256_matches_known_vector() { + let digest = hmac_sha256_hex(b"key", b"The quick brown fox jumps over the lazy dog"); + assert_eq!( + digest, + "f7bc83f430538424b13298e6aa6fb143ef4d59a14946175997479dbc2d1a3cd8" + ); + } + + #[test] + fn hmac_replay_cache_rejects_reused_and_stale_nonces() { + let mut cache = HmacReplayCache::default(); + let now = 1_900_000_000; + + assert!(cache.check_and_record("nonce-1", now, now)); + assert!( + !cache.check_and_record("nonce-1", now, now), + "same nonce inside the freshness window must be rejected" + ); + assert!( + !cache.check_and_record("nonce-old", now - HMAC_FRESHNESS_WINDOW_SECONDS - 1, now,), + "stale timestamps must be rejected" + ); + assert!( + !cache.check_and_record("nonce-future", now + HMAC_FRESHNESS_WINDOW_SECONDS + 1, now,), + "far-future timestamps must be rejected" + ); + assert!( + cache.check_and_record( + "nonce-1", + now + HMAC_FRESHNESS_WINDOW_SECONDS + 1, + now + HMAC_FRESHNESS_WINDOW_SECONDS + 1, + ), + "expired nonce entries should be pruned" + ); + } + #[test] fn load_shed_converts_concurrency_backpressure_to_overload_response() { #[derive(Clone)] @@ -342,6 +448,11 @@ mod tests { .method("POST") .uri("/api/v1/files/batch") .header("X-Loom-Component", "clarion:deadbeef") + .header( + "X-Loom-Timestamp", + OffsetDateTime::now_utc().unix_timestamp().to_string(), + ) + .header("X-Loom-Nonce", "body-read-failure") .body(Body::from(oversize)) .expect("request"); @@ -352,7 +463,15 @@ mod tests { let app: Router<()> = Router::new() .route("/api/v1/files/batch", post(never_called)) .layer(axum::middleware::from_fn(|request, next| async move { - require_hmac_identity("test-secret", HTTP_BODY_LIMIT_BYTES, request, next).await + let replay_cache = new_hmac_replay_cache(); + require_hmac_identity( + "test-secret", + &replay_cache, + HTTP_BODY_LIMIT_BYTES, + request, + next, + ) + .await })); let response = app.oneshot(request).await.expect("oneshot response"); diff --git a/crates/clarion-cli/src/http_read/linkages.rs b/crates/clarion-cli/src/http_read/linkages.rs index 07db81cf..b575164d 100644 --- a/crates/clarion-cli/src/http_read/linkages.rs +++ b/crates/clarion-cli/src/http_read/linkages.rs @@ -444,6 +444,7 @@ mod tests { instance_id, auth_token: None, identity_secret: Some(Arc::new(secret.to_owned())), + hmac_replay_cache: crate::http_read::auth::new_hmac_replay_cache(), taint_writer: None, }; (state, tempdir) diff --git a/crates/clarion-cli/src/http_read/test_support.rs b/crates/clarion-cli/src/http_read/test_support.rs index 6ab816fe..c3dc6022 100644 --- a/crates/clarion-cli/src/http_read/test_support.rs +++ b/crates/clarion-cli/src/http_read/test_support.rs @@ -13,11 +13,22 @@ pub(crate) fn hmac_request( path_and_query: &str, body: &[u8], ) -> axum::http::Request { - let signature = component_hmac_hex(secret.as_bytes(), method, path_and_query, body); + let timestamp = time::OffsetDateTime::now_utc().unix_timestamp(); + let nonce = uuid::Uuid::new_v4().to_string(); + let signature = component_hmac_hex( + secret.as_bytes(), + method, + path_and_query, + body, + timestamp, + &nonce, + ); axum::http::Request::builder() .method(method) .uri(path_and_query) .header("X-Loom-Component", format!("clarion:{signature}")) + .header("X-Loom-Timestamp", timestamp.to_string()) + .header("X-Loom-Nonce", nonce) .header(header::CONTENT_TYPE, "application/json") .body(axum::body::Body::from(body.to_vec())) .expect("build request") diff --git a/crates/clarion-cli/src/http_read/wardline.rs b/crates/clarion-cli/src/http_read/wardline.rs index fd9c1be6..8c7919d9 100644 --- a/crates/clarion-cli/src/http_read/wardline.rs +++ b/crates/clarion-cli/src/http_read/wardline.rs @@ -659,6 +659,7 @@ mod tests { instance_id, auth_token: None, identity_secret: Some(Arc::new(secret.to_owned())), + hmac_replay_cache: crate::http_read::auth::new_hmac_replay_cache(), taint_writer: None, }; (state, tempdir) @@ -775,6 +776,7 @@ mod tests { instance_id, auth_token: None, identity_secret: Some(Arc::new(secret.to_owned())), + hmac_replay_cache: crate::http_read::auth::new_hmac_replay_cache(), taint_writer: Some(writer.sender()), }; (state, db_path, writer, tempdir) @@ -1213,6 +1215,7 @@ mod tests { instance_id, auth_token: None, identity_secret: Some(Arc::new(secret.to_owned())), + hmac_replay_cache: crate::http_read::auth::new_hmac_replay_cache(), taint_writer: None, }; (state, tempdir) @@ -1508,6 +1511,7 @@ mod tests { instance_id, auth_token: None, identity_secret: Some(Arc::new(secret.to_owned())), + hmac_replay_cache: crate::http_read::auth::new_hmac_replay_cache(), taint_writer: None, }; diff --git a/crates/clarion-cli/src/main.rs b/crates/clarion-cli/src/main.rs index d2ee5641..7e27770a 100644 --- a/crates/clarion-cli/src/main.rs +++ b/crates/clarion-cli/src/main.rs @@ -5,6 +5,7 @@ mod clustering; mod config; mod db; mod doctor; +mod guidance; mod hook; mod hooks_settings; mod http_read; @@ -12,11 +13,13 @@ mod install; mod instance; mod mcp_registration; mod run_lifecycle; +mod sarif; mod secret_scan; mod sei_git; mod serve; mod skill_pack; mod stats; +mod wardline_guidance; use anyhow::Result; use clap::Parser; @@ -96,6 +99,7 @@ fn main() -> Result<()> { force, } => db::backup(&path, &output, force), }, + cli::Command::Guidance { command } => guidance::run(command), cli::Command::Doctor { path, fix } => { // doctor prints its own report; map an unhealthy result to a // non-zero exit so it can gate CI / pre-commit. The Result<()> arm @@ -106,6 +110,13 @@ fn main() -> Result<()> { } Ok(()) } + cli::Command::Sarif { command } => match command { + cli::SarifCommand::Import { + file, + scan_source, + path, + } => sarif::run_import(&file, scan_source, &path), + }, } } diff --git a/crates/clarion-cli/src/sarif.rs b/crates/clarion-cli/src/sarif.rs new file mode 100644 index 00000000..eb06c665 --- /dev/null +++ b/crates/clarion-cli/src/sarif.rs @@ -0,0 +1,196 @@ +use std::fs; +use std::path::Path; + +use anyhow::{Context, Result, anyhow}; +use serde_json::{Map, Value, json}; + +use clarion_federation::filigree::FiligreeHttpClient; +use clarion_federation::scan_results::ScanResultsRequest; + +/// Translate SARIF findings from a file and post them to Filigree. +#[allow(clippy::too_many_lines, clippy::collapsible_if)] +pub fn run_import(file: &Path, scan_source_opt: Option, project_path: &Path) -> Result<()> { + let project_root = project_path + .canonicalize() + .with_context(|| format!("cannot canonicalise path {}", project_path.display()))?; + + // Load MCP config + let mcp_config = crate::analyze::load_mcp_config(&project_root, None); + + // Build Filigree HTTP client + let client = FiligreeHttpClient::from_config(&mcp_config.integrations.filigree, |name| { + std::env::var(name).ok() + }) + .context("build Filigree HTTP client")? + .ok_or_else(|| anyhow!("Filigree integration is disabled in clarion.yaml"))?; + + // Read and parse SARIF file + let content = fs::read_to_string(file) + .with_context(|| format!("failed to read SARIF file: {}", file.display()))?; + let sarif: Value = serde_json::from_str(&content) + .with_context(|| format!("failed to parse SARIF JSON in file: {}", file.display()))?; + + // Extract tool name and determine scan_source + let driver_name = sarif + .get("runs") + .and_then(|r| r.as_array()) + .and_then(|r| r.first()) + .and_then(|r| r.get("tool")) + .and_then(|t| t.get("driver")) + .and_then(|d| d.get("name")) + .and_then(|n| n.as_str()) + .unwrap_or("unknown"); + + let scan_source = match scan_source_opt { + Some(src) => src, + None => { + if driver_name.eq_ignore_ascii_case("wardline") { + "wardline".to_owned() + } else { + driver_name.to_lowercase() + } + } + }; + + // Parse findings + let mut findings = Vec::new(); + if let Some(runs) = sarif.get("runs").and_then(|r| r.as_array()) { + for run in runs { + if let Some(results) = run.get("results").and_then(|r| r.as_array()) { + for res in results { + let rule_id = res + .get("ruleId") + .and_then(|r| r.as_str()) + .unwrap_or("unknown-rule") + .to_owned(); + let message = res + .get("message") + .and_then(|m| m.get("text")) + .and_then(|t| t.as_str()) + .unwrap_or("") + .to_owned(); + let level = res + .get("level") + .and_then(|l| l.as_str()) + .unwrap_or("warning"); + let severity = match level { + "error" => "high", + "warning" => "medium", + _ => "info", + }; + + // Physical location mapping + let mut path = None; + let mut line_start = None; + let mut line_end = None; + + if let Some(locations) = res.get("locations").and_then(|l| l.as_array()) { + if let Some(loc) = locations.first() { + if let Some(phys_loc) = loc.get("physicalLocation") { + if let Some(al) = phys_loc.get("artifactLocation") { + if let Some(uri) = al.get("uri").and_then(|u| u.as_str()) { + let clean_uri = if let Some(stripped) = + uri.strip_prefix("file://") + { + stripped + } else if let Some(stripped) = uri.strip_prefix("file:///") + { + stripped + } else { + uri + }; + path = Some(clean_uri.trim_start_matches('/').to_owned()); + } + } + if let Some(region) = phys_loc.get("region") { + line_start = region.get("startLine").and_then(Value::as_i64); + line_end = region + .get("endLine") + .and_then(Value::as_i64) + .or(line_start); + } + } + } + } + + let path = match path { + Some(p) if !p.is_empty() => p, + _ => continue, // skip findings with no path + }; + + let properties = res + .get("properties") + .cloned() + .unwrap_or_else(|| Value::Object(Map::new())); + + let mut metadata = Map::new(); + metadata.insert("kind".to_owned(), json!("defect")); + if scan_source == "wardline" { + metadata.insert("wardline_properties".to_owned(), properties); + } else { + metadata.insert("sarif_properties".to_owned(), properties); + } + + let mut wire_find = Map::new(); + wire_find.insert("path".to_owned(), json!(path)); + wire_find.insert("rule_id".to_owned(), json!(rule_id)); + wire_find.insert("message".to_owned(), json!(message)); + wire_find.insert("severity".to_owned(), json!(severity)); + if let Some(ls) = line_start { + wire_find.insert("line_start".to_owned(), json!(ls)); + } + if let Some(le) = line_end { + wire_find.insert("line_end".to_owned(), json!(le)); + } + wire_find.insert("metadata".to_owned(), Value::Object(metadata)); + + findings.push(Value::Object(wire_find)); + } + } + } + } + + let total_findings = findings.len(); + tracing::info!( + file = %file.display(), + scan_source = %scan_source, + findings_count = total_findings, + "parsed SARIF findings" + ); + + let request = ScanResultsRequest { + scan_source: scan_source.clone(), + scan_run_id: None, + mark_unseen: true, + create_observations: false, + complete_scan_run: true, + findings, + }; + + // Run client POST in a separate thread to bypass nested tokio runtime panics in reqwest::blocking + let thread_client = client.clone(); + let worker = std::thread::spawn(move || thread_client.post_scan_results(&request)); + let response = worker + .join() + .map_err(|_| anyhow!("SARIF post thread panicked"))? + .map_err(|err| anyhow!("post findings to Filigree failed: {err}"))?; + + tracing::info!( + scan_source = %scan_source, + created = response.findings_created, + updated = response.findings_updated, + warnings = response.warnings.len(), + "successfully imported SARIF findings to Filigree" + ); + + for warning in &response.warnings { + tracing::warn!(warning = %warning, "Filigree intake warning"); + } + + println!( + "Import complete: {} created, {} updated", + response.findings_created, response.findings_updated + ); + + Ok(()) +} diff --git a/crates/clarion-cli/src/secret_scan.rs b/crates/clarion-cli/src/secret_scan.rs index 827cd99f..3703b019 100644 --- a/crates/clarion-cli/src/secret_scan.rs +++ b/crates/clarion-cli/src/secret_scan.rs @@ -194,8 +194,17 @@ impl SecretScanOutcome { run_id: &str, project_root: &Path, started_at: &str, + head_commit: Option<&str>, ) -> Result<()> { - anchors::ensure_and_emit_findings(self, writer, run_id, project_root, started_at).await + anchors::ensure_and_emit_findings( + self, + writer, + run_id, + project_root, + started_at, + head_commit, + ) + .await } } diff --git a/crates/clarion-cli/src/secret_scan/anchors.rs b/crates/clarion-cli/src/secret_scan/anchors.rs index c83fb7a1..84155e99 100644 --- a/crates/clarion-cli/src/secret_scan/anchors.rs +++ b/crates/clarion-cli/src/secret_scan/anchors.rs @@ -33,8 +33,9 @@ pub(super) async fn ensure_and_emit_findings( run_id: &str, project_root: &Path, started_at: &str, + head_commit: Option<&str>, ) -> Result<()> { - ensure_finding_anchors(outcome, writer, project_root, started_at).await?; + ensure_finding_anchors(outcome, writer, project_root, started_at, head_commit).await?; emit_findings( writer, run_id, @@ -50,6 +51,7 @@ async fn ensure_finding_anchors( writer: &Writer, project_root: &Path, started_at: &str, + head_commit: Option<&str>, ) -> Result<()> { // Pass 1: paths with active findings this run get anchored with whatever // briefing_blocks reason applies (or none, if an override cleared it). @@ -58,7 +60,7 @@ async fn ensure_finding_anchors( if outcome.finding_anchors.contains_key(&key) { continue; } - upsert_finding_anchor(outcome, writer, project_root, started_at, key).await?; + upsert_finding_anchor(outcome, writer, project_root, started_at, head_commit, key).await?; } // Pass 2: every sidecar path scanned this run that pass 1 did not anchor // (i.e. no current finding). The upsert refreshes properties + content_hash @@ -77,7 +79,7 @@ async fn ensure_finding_anchors( if outcome.finding_anchors.contains_key(&key) { continue; } - upsert_finding_anchor(outcome, writer, project_root, started_at, key).await?; + upsert_finding_anchor(outcome, writer, project_root, started_at, head_commit, key).await?; } Ok(()) } @@ -87,6 +89,7 @@ async fn upsert_finding_anchor( writer: &Writer, project_root: &Path, started_at: &str, + head_commit: Option<&str>, key: PathBuf, ) -> Result<()> { let id = secret_finding_anchor_id(project_root, &key); @@ -104,7 +107,7 @@ async fn upsert_finding_anchor( serde_json::Value::String(reason.as_str().to_owned()), ); } - let record = EntityRecord { + let mut record = EntityRecord { id: id.clone(), plugin_id: "core".to_owned(), kind: "file".to_owned(), @@ -118,6 +121,7 @@ async fn upsert_finding_anchor( source_line_start: None, source_line_end: None, properties_json: serde_json::Value::Object(properties).to_string(), + tags: Vec::new(), content_hash: file_content_hash(&key), summary_json: None, wardline_json: None, @@ -126,6 +130,10 @@ async fn upsert_finding_anchor( created_at: started_at.to_owned(), updated_at: started_at.to_owned(), }; + if let Some(commit) = head_commit { + record.first_seen_commit = Some(commit.to_owned()); + record.last_seen_commit = Some(commit.to_owned()); + } writer .send_wait(|ack| WriterCmd::InsertEntity { entity: Box::new(record), diff --git a/crates/clarion-cli/src/serve.rs b/crates/clarion-cli/src/serve.rs index bab2beef..8fcd4e80 100644 --- a/crates/clarion-cli/src/serve.rs +++ b/crates/clarion-cli/src/serve.rs @@ -11,10 +11,10 @@ use clarion_core::{ CodexCliProvider, CodexCliProviderConfig, EmbeddingProvider, EmbeddingProviderError, LlmProvider, OpenRouterProvider, OpenRouterProviderConfig, Recording, RecordingProvider, }; -use clarion_mcp::config::{ +use clarion_federation::config::{ LlmConfig, McpConfig, ProviderSelection, SemanticSearchConfig, select_provider_with_env, }; -use clarion_mcp::filigree::FiligreeHttpClient; +use clarion_federation::filigree::FiligreeHttpClient; use clarion_storage::{DEFAULT_BATCH_SIZE, DEFAULT_CHANNEL_CAPACITY, ReaderPool, Writer}; pub fn run(path: &Path, config_path: Option<&Path>) -> Result<()> { @@ -50,7 +50,7 @@ pub fn run(path: &Path, config_path: Option<&Path>) -> Result<()> { // (which goes stale, the dogfood bug) — then build the client against the // resolved URL so `issues_for` reaches the running dashboard. The same // resolution is surfaced by `project_status`. - let filigree_resolution = clarion_mcp::filigree_url::resolve_filigree_url( + let filigree_resolution = clarion_federation::filigree_url::resolve_filigree_url( &config.integrations.filigree, &project_root, ); @@ -269,7 +269,7 @@ fn supervise_stdio_with_http( /// `None` (honest degrade — the tool reports "not enabled") when semantic search /// is disabled, or when it is enabled but live access is not opted in / no API /// key is present. A genuine misconfiguration (e.g. zero dimensions) fails fast. -fn build_embedding_provider( +pub(crate) fn build_embedding_provider( config: &SemanticSearchConfig, read_env: impl Fn(&str) -> Option, ) -> Result>> { diff --git a/crates/clarion-cli/src/wardline_guidance.rs b/crates/clarion-cli/src/wardline_guidance.rs new file mode 100644 index 00000000..bd82f953 --- /dev/null +++ b/crates/clarion-cli/src/wardline_guidance.rs @@ -0,0 +1,776 @@ +use std::collections::BTreeMap; +use std::collections::BTreeSet; +use std::path::Path; + +use anyhow::{Context, Result}; +use rusqlite::{Connection, OpenFlags}; +use serde::Deserialize; +use serde_json::{Map, Value, json}; + +use clarion_storage::{ + GuidanceSheetInput, get_guidance_sheet, invalidate_summaries_for_sheet, slugify_guidance_name, + upsert_guidance_sheet, +}; + +const PROVENANCE_DERIVED: &str = "wardline_derived"; +const PROVENANCE_OVERRIDDEN: &str = "wardline_derived_overridden"; + +#[derive(Default)] +pub(crate) struct WardlineGuidanceStats { + pub(crate) generated: usize, + pub(crate) overridden: usize, +} + +#[derive(Debug, Default)] +struct WardlineManifest { + tier_entries: BTreeMap, + tier_definitions: BTreeMap, + module_tiers: Vec, + boundaries: BTreeMap, + annotation_groups: BTreeMap, + fingerprint: Option, + exceptions: Option, + overlay_boundaries: Vec, + artifact_hashes: WardlineArtifactHashes, +} + +#[derive(Debug, Default, Deserialize)] +#[serde(default)] +struct RawWardlineManifest { + tiers: Option, + module_tiers: Vec, + #[serde(alias = "boundary_contracts")] + boundaries: BTreeMap, + #[serde(alias = "groups")] + annotation_groups: BTreeMap, +} + +#[derive(Debug, Default, Deserialize)] +#[serde(default)] +struct WardlineGuidanceEntry { + paths: Vec, + content: Option, + scope_level: Option, + match_rules: Option>, +} + +#[derive(Debug, Clone, Default, Deserialize)] +#[serde(default)] +struct WardlineTierDefinition { + id: String, + tier: Option, + description: Option, +} + +#[derive(Debug, Clone, Default, Deserialize)] +#[serde(default)] +struct WardlineModuleTier { + path: String, + default_taint: String, +} + +#[derive(Debug, Clone, Default, Deserialize)] +#[serde(default)] +struct WardlineFingerprint { + fingerprints: Vec, +} + +#[derive(Debug, Clone, Default, Deserialize)] +#[serde(default)] +struct WardlineFingerprintEntry { + decorators: Vec, +} + +#[derive(Debug, Clone, Default, Deserialize)] +#[serde(default)] +struct WardlineExceptions { + exceptions: Vec, +} + +#[derive(Debug, Clone, Default, Deserialize)] +#[serde(default)] +struct WardlineOverlay { + overlay_for: Option, + boundaries: Vec, +} + +#[derive(Debug, Clone, Default, Deserialize)] +#[serde(default)] +struct WardlineOverlayBoundaryEntry { + function: String, + transition: String, + from_tier: Option, + to_tier: Option, + restored_tier: Option, + bounded_context: Option, +} + +#[derive(Debug, Clone)] +struct WardlineOverlayBoundary { + scope: String, + entry: WardlineOverlayBoundaryEntry, +} + +#[derive(Debug, Clone, Default)] +struct WardlineArtifactHashes { + root_manifest_hash: String, + fingerprint_hash: Option, + exceptions_hash: Option, + overlay_hashes: Vec<(String, String)>, +} + +struct GeneratedSheet { + id: String, + name: String, + short_name: String, + properties: Value, +} + +pub(crate) fn sync_wardline_guidance( + db_path: &Path, + project_root: &Path, +) -> Result { + let Some((manifest_hash, manifest)) = read_manifest(project_root)? else { + return Ok(WardlineGuidanceStats::default()); + }; + let generated = generated_sheets(&manifest, &manifest_hash); + if generated.is_empty() { + return Ok(WardlineGuidanceStats::default()); + } + + let conn = open_write_connection(db_path)?; + let now = now_iso8601(&conn)?; + let mut stats = WardlineGuidanceStats::default(); + for mut sheet in generated { + if let Some(obj) = sheet.properties.as_object_mut() { + obj.insert("authored_at".to_owned(), json!(now)); + } + + let before = + get_guidance_sheet(&conn, &sheet.id).map_err(|err| anyhow::anyhow!("{err}"))?; + let mut write_sheet = true; + if let Some(existing) = before.as_ref() { + let stored_signature = existing + .properties + .get("wardline_generated_signature") + .and_then(Value::as_str); + let actual_signature = derived_signature_from_properties(&existing.properties); + let edited = stored_signature != Some(actual_signature.as_str()); + if edited { + let mut properties = existing.properties.clone(); + if let Some(obj) = properties.as_object_mut() { + obj.insert("provenance".to_owned(), json!(PROVENANCE_OVERRIDDEN)); + } + sheet.properties = properties; + write_sheet = true; + stats.overridden += 1; + } + } + + if write_sheet { + upsert_guidance_sheet( + &conn, + &GuidanceSheetInput { + id: &sheet.id, + name: &sheet.name, + short_name: &sheet.short_name, + properties: &sheet.properties, + }, + ) + .map_err(|err| anyhow::anyhow!("{err}")) + .with_context(|| format!("write Wardline-derived guidance {}", sheet.id))?; + + if let Some(before) = before.as_ref() { + let _ = invalidate_summaries_for_sheet(&conn, before, project_root); + } + if let Some(after) = + get_guidance_sheet(&conn, &sheet.id).map_err(|err| anyhow::anyhow!("{err}"))? + { + let _ = invalidate_summaries_for_sheet(&conn, &after, project_root); + } + stats.generated += 1; + } + } + Ok(stats) +} + +pub(crate) fn current_manifest_hash(project_root: &Path) -> Result> { + Ok(read_manifest(project_root)?.map(|(hash, _)| hash)) +} + +fn read_manifest(project_root: &Path) -> Result> { + let path = project_root.join("wardline.yaml"); + if !path.exists() { + return Ok(None); + } + let raw = std::fs::read_to_string(&path) + .with_context(|| format!("read Wardline manifest {}", path.display()))?; + let raw_manifest: RawWardlineManifest = serde_norway::from_str(&raw) + .with_context(|| format!("parse Wardline manifest {}", path.display()))?; + let mut manifest = WardlineManifest::from_raw(raw_manifest)?; + let root_manifest_hash = hash_bytes(raw.as_bytes()); + let mut hash_parts = vec![("wardline.yaml".to_owned(), raw.into_bytes())]; + + let fingerprint = read_optional_json::( + project_root, + &["wardline.fingerprint.json", "fingerprint.json"], + )?; + if let Some(artifact) = fingerprint { + hash_parts.push((artifact.relative_path.clone(), artifact.raw.into_bytes())); + manifest.artifact_hashes.fingerprint_hash = Some(artifact.hash); + manifest.fingerprint = Some(artifact.parsed); + } + + let exceptions = + read_optional_json::(project_root, &["wardline.exceptions.json"])?; + if let Some(artifact) = exceptions { + hash_parts.push((artifact.relative_path.clone(), artifact.raw.into_bytes())); + manifest.artifact_hashes.exceptions_hash = Some(artifact.hash); + manifest.exceptions = Some(artifact.parsed); + } + + for artifact in read_overlay_artifacts(project_root)? { + hash_parts.push((artifact.relative_path.clone(), artifact.raw.into_bytes())); + manifest + .artifact_hashes + .overlay_hashes + .push((artifact.relative_path, artifact.hash)); + let scope = artifact.parsed.overlay_for.unwrap_or_default(); + manifest.overlay_boundaries.extend( + artifact + .parsed + .boundaries + .into_iter() + .filter(|entry| !entry.function.trim().is_empty()) + .map(|entry| WardlineOverlayBoundary { + scope: scope.clone(), + entry, + }), + ); + } + + manifest.artifact_hashes.root_manifest_hash = root_manifest_hash; + let hash = bundle_hash(&hash_parts); + Ok(Some((hash, manifest))) +} + +impl WardlineManifest { + fn from_raw(raw: RawWardlineManifest) -> Result { + let (tier_entries, tier_definitions) = parse_tiers(raw.tiers)?; + Ok(Self { + tier_entries, + tier_definitions, + module_tiers: raw.module_tiers, + boundaries: raw.boundaries, + annotation_groups: raw.annotation_groups, + ..Self::default() + }) + } +} + +fn parse_tiers( + tiers: Option, +) -> Result<( + BTreeMap, + BTreeMap, +)> { + let Some(tiers) = tiers else { + return Ok((BTreeMap::new(), BTreeMap::new())); + }; + if tiers.is_object() { + let entries = serde_json::from_value::>(tiers) + .context("parse Wardline guidance-style tier map")?; + return Ok((entries, BTreeMap::new())); + } + if tiers.is_array() { + let definitions = serde_json::from_value::>(tiers) + .context("parse Wardline tier definitions")? + .into_iter() + .filter(|definition| !definition.id.trim().is_empty()) + .map(|definition| (definition.id.clone(), definition)) + .collect(); + return Ok((BTreeMap::new(), definitions)); + } + anyhow::bail!("Wardline tiers must be either a guidance map or a tier-definition array"); +} + +struct ParsedArtifact { + relative_path: String, + raw: String, + hash: String, + parsed: T, +} + +fn read_optional_json( + project_root: &Path, + candidates: &[&str], +) -> Result>> +where + T: for<'de> Deserialize<'de>, +{ + for candidate in candidates { + let path = project_root.join(candidate); + if !path.exists() { + continue; + } + let raw = + std::fs::read_to_string(&path).with_context(|| format!("read {}", path.display()))?; + let parsed = + serde_json::from_str(&raw).with_context(|| format!("parse {}", path.display()))?; + return Ok(Some(ParsedArtifact { + relative_path: (*candidate).to_owned(), + hash: hash_bytes(raw.as_bytes()), + raw, + parsed, + })); + } + Ok(None) +} + +fn read_overlay_artifacts(project_root: &Path) -> Result>> { + let mut paths = Vec::new(); + collect_overlay_paths(project_root, &mut paths)?; + paths.sort(); + let mut overlays = Vec::new(); + for path in paths { + let raw = + std::fs::read_to_string(&path).with_context(|| format!("read {}", path.display()))?; + let parsed: WardlineOverlay = + serde_norway::from_str(&raw).with_context(|| format!("parse {}", path.display()))?; + let relative_path = path + .strip_prefix(project_root) + .ok() + .and_then(|rel| rel.to_str()) + .unwrap_or_else(|| path.to_str().unwrap_or("wardline.overlay.yaml")) + .replace('\\', "/"); + let scope = overlay_scope(&relative_path, parsed.overlay_for.as_deref()); + overlays.push(ParsedArtifact { + relative_path, + hash: hash_bytes(raw.as_bytes()), + raw, + parsed: WardlineOverlay { + overlay_for: Some(scope), + boundaries: parsed.boundaries, + }, + }); + } + Ok(overlays) +} + +fn collect_overlay_paths(dir: &Path, paths: &mut Vec) -> Result<()> { + let entries = match std::fs::read_dir(dir) { + Ok(entries) => entries, + Err(err) if err.kind() == std::io::ErrorKind::NotFound => return Ok(()), + Err(err) => return Err(err).with_context(|| format!("read directory {}", dir.display())), + }; + for entry in entries { + let entry = entry.with_context(|| format!("read directory entry {}", dir.display()))?; + let path = entry.path(); + let file_name = entry.file_name(); + let file_name = file_name.to_string_lossy(); + if entry.file_type()?.is_dir() { + if matches!( + file_name.as_ref(), + ".git" | ".clarion" | ".venv" | "target" | "node_modules" + ) { + continue; + } + collect_overlay_paths(&path, paths)?; + } else if file_name == "wardline.overlay.yaml" { + paths.push(path); + } + } + Ok(()) +} + +fn overlay_scope(relative_path: &str, overlay_for: Option<&str>) -> String { + if let Some(scope) = overlay_for.map(str::trim) + && !scope.is_empty() + && scope != "." + && scope != "wardline.yaml" + { + return scope.trim_matches('/').to_owned(); + } + Path::new(relative_path) + .parent() + .and_then(Path::to_str) + .unwrap_or("") + .trim_matches('/') + .replace('\\', "/") +} + +fn hash_bytes(bytes: &[u8]) -> String { + format!("blake3:{}", blake3::hash(bytes).to_hex()) +} + +fn bundle_hash(parts: &[(String, Vec)]) -> String { + let mut hasher = blake3::Hasher::new(); + for (label, bytes) in parts { + hasher.update(label.as_bytes()); + hasher.update(&[0]); + hasher.update(bytes); + hasher.update(&[0xff]); + } + format!("blake3:{}", hasher.finalize().to_hex()) +} + +fn generated_sheets(manifest: &WardlineManifest, manifest_hash: &str) -> Vec { + let mut sheets = Vec::new(); + let artifact_properties = artifact_properties(manifest); + if manifest.module_tiers.is_empty() { + for (key, entry) in &manifest.tier_entries { + sheets.push(generated_sheet( + "tier", + key, + entry, + manifest_hash, + &artifact_properties, + &format!( + "Wardline tier `{key}` applies here. Preserve its trust-boundary intent when summarising or changing this code." + ), + )); + } + } else { + for assignment in &manifest.module_tiers { + if assignment.path.trim().is_empty() || assignment.default_taint.trim().is_empty() { + continue; + } + let key = format!("{}-{}", assignment.path, assignment.default_taint); + let tier = manifest.tier_definitions.get(&assignment.default_taint); + let tier_number = tier.and_then(|definition| definition.tier); + let description = tier.and_then(|definition| definition.description.as_deref()); + let content = module_tier_content(assignment, tier_number, description); + let entry = WardlineGuidanceEntry { + paths: vec![path_glob(&assignment.path)], + content: Some(content), + scope_level: Some("module".to_owned()), + match_rules: None, + }; + sheets.push(generated_sheet( + "tier", + &key, + &entry, + manifest_hash, + &artifact_properties, + "", + )); + } + } + for (key, entry) in &manifest.boundaries { + sheets.push(generated_sheet( + "boundary", + key, + entry, + manifest_hash, + &artifact_properties, + &format!( + "Wardline boundary contract `{key}` applies here. Call out boundary assumptions and cross-boundary effects." + ), + )); + } + for boundary in &manifest.overlay_boundaries { + sheets.push(generated_overlay_boundary_sheet( + boundary, + manifest_hash, + &artifact_properties, + )); + } + for (key, entry) in &manifest.annotation_groups { + sheets.push(generated_sheet( + "annotation_group", + key, + entry, + manifest_hash, + &artifact_properties, + &format!( + "Wardline annotation group `{key}` applies here. Preserve the group-specific review context." + ), + )); + } + for decorator in fingerprint_decorators(manifest) { + let entry = WardlineGuidanceEntry { + paths: Vec::new(), + content: Some(format!( + "Wardline annotation group `{decorator}` is present in the fingerprint baseline. Preserve its Wardline review semantics when interpreting affected code." + )), + scope_level: Some("project".to_owned()), + match_rules: Some(vec![json!({"type": "wardline_group", "name": decorator})]), + }; + sheets.push(generated_sheet( + "annotation_group", + &decorator, + &entry, + manifest_hash, + &artifact_properties, + "", + )); + } + sheets.sort_by(|a, b| a.id.cmp(&b.id)); + sheets +} + +fn artifact_properties(manifest: &WardlineManifest) -> Map { + let mut properties = Map::new(); + properties.insert( + "wardline_root_manifest_hash".to_owned(), + json!(manifest.artifact_hashes.root_manifest_hash), + ); + if let Some(hash) = &manifest.artifact_hashes.fingerprint_hash { + properties.insert("wardline_fingerprint_hash".to_owned(), json!(hash)); + properties.insert( + "wardline_fingerprint_count".to_owned(), + json!( + manifest + .fingerprint + .as_ref() + .map_or(0, |fingerprint| fingerprint.fingerprints.len()) + ), + ); + } + if let Some(hash) = &manifest.artifact_hashes.exceptions_hash { + properties.insert("wardline_exceptions_hash".to_owned(), json!(hash)); + properties.insert( + "wardline_exception_count".to_owned(), + json!( + manifest + .exceptions + .as_ref() + .map_or(0, |exceptions| exceptions.exceptions.len()) + ), + ); + } + if !manifest.artifact_hashes.overlay_hashes.is_empty() { + let overlays: Vec = manifest + .artifact_hashes + .overlay_hashes + .iter() + .map(|(path, hash)| json!({"path": path, "hash": hash})) + .collect(); + properties.insert("wardline_overlay_hashes".to_owned(), json!(overlays)); + } + properties +} + +fn fingerprint_decorators(manifest: &WardlineManifest) -> BTreeSet { + manifest + .fingerprint + .as_ref() + .into_iter() + .flat_map(|fingerprint| &fingerprint.fingerprints) + .flat_map(|entry| &entry.decorators) + .filter(|decorator| !decorator.trim().is_empty()) + .cloned() + .collect() +} + +fn module_tier_content( + assignment: &WardlineModuleTier, + tier_number: Option, + description: Option<&str>, +) -> String { + let tier = tier_number.map_or_else( + || assignment.default_taint.clone(), + |number| format!("Tier {number} ({})", assignment.default_taint), + ); + let description = description + .filter(|description| !description.trim().is_empty()) + .map(|description| format!(" {description}")) + .unwrap_or_default(); + format!( + "Wardline assigns `{}` to `{}` as {tier}.{description} Preserve its trust-boundary intent when summarising or changing this code.", + assignment.default_taint, assignment.path + ) +} + +fn path_glob(path: &str) -> String { + let path = path.trim().trim_matches('/'); + if path.is_empty() { + "**".to_owned() + } else if path.contains('*') || path.contains('?') { + path.to_owned() + } else { + format!("{path}/**") + } +} + +fn generated_overlay_boundary_sheet( + boundary: &WardlineOverlayBoundary, + manifest_hash: &str, + artifact_properties: &Map, +) -> GeneratedSheet { + let key = format!("{}-{}", boundary.scope, boundary.entry.function); + let mut details = Vec::new(); + if let Some(from_tier) = boundary.entry.from_tier { + details.push(format!("from Tier {from_tier}")); + } + if let Some(to_tier) = boundary.entry.to_tier { + details.push(format!("to Tier {to_tier}")); + } + if let Some(restored_tier) = boundary.entry.restored_tier { + details.push(format!("restoring Tier {restored_tier}")); + } + if boundary.entry.bounded_context.is_some() { + details.push("with bounded-context contracts".to_owned()); + } + let suffix = if details.is_empty() { + String::new() + } else { + format!(" ({})", details.join(", ")) + }; + let entry = WardlineGuidanceEntry { + paths: vec![path_glob(&boundary.scope)], + content: Some(format!( + "Wardline boundary `{}` in `{}` declares transition `{}`{suffix}. Call out boundary assumptions and cross-boundary effects.", + boundary.entry.function, boundary.scope, boundary.entry.transition + )), + scope_level: Some("subsystem".to_owned()), + match_rules: None, + }; + generated_sheet( + "boundary", + &key, + &entry, + manifest_hash, + artifact_properties, + "", + ) +} + +fn generated_sheet( + kind: &str, + key: &str, + entry: &WardlineGuidanceEntry, + manifest_hash: &str, + artifact_properties: &Map, + default_content: &str, +) -> GeneratedSheet { + let name = slugify_guidance_name(&format!("wardline-{}-{key}", kind.replace('_', "-"))); + let id = format!("core:guidance:{name}"); + let short_name = name.rsplit('.').next().unwrap_or(&name).to_owned(); + let match_rules = entry.match_rules.clone().unwrap_or_else(|| { + entry + .paths + .iter() + .map(|path| json!({ "type": "path", "pattern": path })) + .collect() + }); + let content = entry + .content + .clone() + .unwrap_or_else(|| default_content.to_owned()); + let scope_level = entry + .scope_level + .clone() + .unwrap_or_else(|| "module".to_owned()); + let signature = derived_signature(&content, &scope_level, &match_rules, kind, key); + let mut properties = json!({ + "content": content, + "scope_level": scope_level, + "match_rules": match_rules, + "pinned": true, + "provenance": PROVENANCE_DERIVED, + "wardline_kind": kind, + "wardline_key": key, + "wardline_manifest_hash": manifest_hash, + "wardline_generated_signature": signature, + }); + if let Some(obj) = properties.as_object_mut() { + obj.extend(artifact_properties.clone()); + } + GeneratedSheet { + id, + name, + short_name, + properties, + } +} + +pub(crate) fn is_wardline_derived(properties: &Value) -> bool { + matches!( + properties.get("provenance").and_then(Value::as_str), + Some(PROVENANCE_DERIVED | PROVENANCE_OVERRIDDEN) + ) +} + +fn derived_signature_from_properties(properties: &Value) -> String { + let content = properties + .get("content") + .and_then(Value::as_str) + .unwrap_or_default(); + let scope_level = properties + .get("scope_level") + .and_then(Value::as_str) + .unwrap_or_default(); + let match_rules = properties + .get("match_rules") + .and_then(Value::as_array) + .cloned() + .unwrap_or_default(); + let kind = properties + .get("wardline_kind") + .and_then(Value::as_str) + .unwrap_or_default(); + let key = properties + .get("wardline_key") + .and_then(Value::as_str) + .unwrap_or_default(); + derived_signature(content, scope_level, &match_rules, kind, key) +} + +fn derived_signature( + content: &str, + scope_level: &str, + match_rules: &[Value], + kind: &str, + key: &str, +) -> String { + let payload = json!({ + "content": content, + "scope_level": scope_level, + "match_rules": match_rules, + "pinned": true, + "wardline_kind": kind, + "wardline_key": key, + }); + let bytes = serde_json::to_vec(&payload).unwrap_or_default(); + format!("blake3:{}", blake3::hash(&bytes).to_hex()) +} + +fn open_write_connection(path: &Path) -> Result { + let conn = Connection::open_with_flags( + path, + OpenFlags::SQLITE_OPEN_READ_WRITE | OpenFlags::SQLITE_OPEN_URI, + ) + .with_context(|| format!("open database {}", path.display()))?; + conn.busy_timeout(std::time::Duration::from_secs(5)) + .context("set busy_timeout")?; + conn.pragma_update(None, "foreign_keys", "ON") + .context("enable foreign_keys")?; + Ok(conn) +} + +fn now_iso8601(conn: &Connection) -> Result { + let ts: String = conn + .query_row("SELECT strftime('%Y-%m-%dT%H:%M:%fZ','now')", [], |row| { + row.get(0) + }) + .context("mint guidance timestamp")?; + Ok(ts) +} + +#[cfg(test)] +mod tests { + use super::{overlay_scope, path_glob}; + use clarion_storage::glob_match; + + #[test] + fn root_overlay_scope_glob_matches_project_relative_paths() { + let scope = overlay_scope("wardline.overlay.yaml", None); + let pattern = path_glob(&scope); + + assert_eq!(scope, ""); + assert_eq!(pattern, "**"); + assert!(glob_match(&pattern, "src/foo.py")); + assert!(glob_match(&pattern, "foo.py")); + } +} diff --git a/crates/clarion-cli/tests/analyze.rs b/crates/clarion-cli/tests/analyze.rs index d0ddf025..4f1846c1 100644 --- a/crates/clarion-cli/tests/analyze.rs +++ b/crates/clarion-cli/tests/analyze.rs @@ -522,6 +522,332 @@ analysis: ) } +#[cfg(unix)] +const CATEGORISED_PLUGIN_SCRIPT: &str = r#"#!/usr/bin/python3 +import json +import sys + + +def read_frame(): + headers = {} + while True: + line = sys.stdin.buffer.readline() + if line in (b"", b"\r\n"): + break + name, value = line.decode("ascii").strip().split(":", 1) + headers[name.lower()] = value.strip() + length = int(headers["content-length"]) + return json.loads(sys.stdin.buffer.read(length)) + + +def write_frame(message): + body = json.dumps(message, separators=(",", ":")).encode("utf-8") + sys.stdout.buffer.write(b"Content-Length: " + str(len(body)).encode("ascii") + b"\r\n\r\n") + sys.stdout.buffer.write(body) + sys.stdout.buffer.flush() + + +while True: + msg = read_frame() + method = msg.get("method") + if method == "initialized": + continue + if method == "exit": + raise SystemExit(0) + ident = msg["id"] + if method == "initialize": + write_frame({ + "jsonrpc": "2.0", + "id": ident, + "result": { + "name": "clarion-plugin-categorised", + "version": "0.1.0", + "ontology_version": "0.1.0", + "capabilities": {}, + }, + }) + elif method == "analyze_file": + path = msg["params"]["file_path"] + write_frame({ + "jsonrpc": "2.0", + "id": ident, + "result": { + "entities": [ + { + "id": "catfixture:module:app", + "kind": "module", + "qualified_name": "app", + "source": { + "file_path": path, + "source_range": { + "start_line": 1, + "start_col": 0, + "end_line": 3, + "end_col": 0 + }, + }, + }, + { + "id": "catfixture:function:app.main", + "kind": "function", + "qualified_name": "app.main", + "source": { + "file_path": path, + "source_range": { + "start_line": 1, + "start_col": 0, + "end_line": 2, + "end_col": 8 + }, + }, + "parent_id": "catfixture:module:app", + "tags": ["entry-point"], + "docstring": "Launches service", + }, + ], + "edges": [ + { + "kind": "contains", + "from_id": "catfixture:module:app", + "to_id": "catfixture:function:app.main", + } + ], + "stats": {}, + }, + }) + elif method == "shutdown": + write_frame({"jsonrpc": "2.0", "id": ident, "result": {}}) +"#; + +#[cfg(unix)] +const CATEGORISED_PLUGIN_MANIFEST: &str = r#" +[plugin] +name = "clarion-plugin-categorised" +plugin_id = "catfixture" +version = "0.1.0" +protocol_version = "1.0" +executable = "clarion-plugin-categorised" +language = "catfixture" +extensions = ["cat"] + +[capabilities.runtime] +expected_max_rss_mb = 128 +expected_entities_per_file = 100 +wardline_aware = false +reads_outside_project_root = false + +[ontology] +entity_kinds = ["module", "function"] +edge_kinds = ["contains"] +rule_id_prefix = "CLA-CAT-" +ontology_version = "0.1.0" +"#; + +#[cfg(unix)] +fn write_categorised_plugin(plugin_dir: &std::path::Path) { + use std::os::unix::fs::PermissionsExt; + + let plugin_script = plugin_dir.join("clarion-plugin-categorised"); + std::fs::write(&plugin_script, CATEGORISED_PLUGIN_SCRIPT) + .expect("write categorised plugin script"); + let mut perms = std::fs::metadata(&plugin_script) + .expect("stat categorised plugin") + .permissions(); + perms.set_mode(0o755); + std::fs::set_permissions(&plugin_script, perms).expect("chmod categorised plugin"); + + std::fs::write(plugin_dir.join("plugin.toml"), CATEGORISED_PLUGIN_MANIFEST) + .expect("write categorised plugin manifest"); +} + +#[cfg(unix)] +fn spawn_embedding_mock() -> (String, std::thread::JoinHandle>) { + use std::io::{Read, Write}; + use std::net::TcpListener; + use std::time::{Duration, Instant}; + + fn read_http_request(stream: &mut std::net::TcpStream) -> String { + stream + .set_read_timeout(Some(Duration::from_secs(2))) + .expect("set read timeout"); + let mut buffer = Vec::new(); + let mut chunk = [0_u8; 1024]; + let mut header_end = None; + while header_end.is_none() { + let read = stream.read(&mut chunk).expect("read headers"); + if read == 0 { + break; + } + buffer.extend_from_slice(&chunk[..read]); + header_end = buffer + .windows(4) + .position(|w| w == b"\r\n\r\n") + .map(|i| i + 4); + } + let Some(header_end) = header_end else { + return String::from_utf8_lossy(&buffer).into_owned(); + }; + let headers = String::from_utf8_lossy(&buffer[..header_end]).to_ascii_lowercase(); + let content_length = headers + .lines() + .find_map(|line| line.strip_prefix("content-length:")) + .and_then(|value| value.trim().parse::().ok()) + .unwrap_or(0); + while buffer.len().saturating_sub(header_end) < content_length { + let read = stream.read(&mut chunk).expect("read body"); + if read == 0 { + break; + } + buffer.extend_from_slice(&chunk[..read]); + } + String::from_utf8_lossy(&buffer).into_owned() + } + + let listener = TcpListener::bind("127.0.0.1:0").expect("bind embedding mock"); + let addr = listener.local_addr().expect("mock addr"); + listener + .set_nonblocking(true) + .expect("nonblocking embedding mock"); + let handle = std::thread::spawn(move || { + let deadline = Instant::now() + Duration::from_secs(5); + let mut requests = Vec::new(); + while Instant::now() < deadline { + match listener.accept() { + Ok((mut stream, _)) => { + let request = read_http_request(&mut stream); + let body = request.split("\r\n\r\n").nth(1).unwrap_or("{}"); + let payload: serde_json::Value = + serde_json::from_str(body).expect("embedding request json"); + let count = payload["input"].as_array().map_or(0, Vec::len); + let data: Vec = (0..count) + .map(|index| { + let first_dim = + f64::from(u32::try_from(index + 1).expect("fixture index fits")); + serde_json::json!({ + "object": "embedding", + "index": index, + "embedding": [first_dim, 1.0], + }) + }) + .collect(); + let response = serde_json::json!({ + "object": "list", + "data": data, + "model": "test-embed", + }) + .to_string(); + write!( + stream, + "HTTP/1.1 200 OK\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", + response.len(), + response + ) + .expect("write embedding response"); + requests.push(request); + return requests; + } + Err(err) if err.kind() == std::io::ErrorKind::WouldBlock => { + std::thread::sleep(Duration::from_millis(25)); + } + Err(err) => panic!("embedding mock accept failed: {err}"), + } + } + requests + }); + (format!("http://{addr}"), handle) +} + +#[cfg(unix)] +#[test] +fn analyze_persists_plugin_tags_and_populates_embedding_sidecar() { + let project_dir = tempfile::tempdir().unwrap(); + let plugin_dir = tempfile::tempdir().unwrap(); + write_categorised_plugin(plugin_dir.path()); + let (embedding_url, embedding_server) = spawn_embedding_mock(); + + clarion_bin() + .args(["install", "--path"]) + .arg(project_dir.path()) + .assert() + .success(); + std::fs::write( + project_dir.path().join("app.cat"), + "def main():\n pass\n", + ) + .expect("write categorised fixture source"); + let config_path = project_dir.path().join("clarion.yaml"); + std::fs::write( + &config_path, + format!( + r" +semantic_search: + enabled: true + allow_live_provider: true + endpoint_url: {embedding_url} + model_id: test-embed + dimensions: 2 + api_key_env: TEST_EMBEDDING_KEY + timeout_seconds: 2 + session_token_ceiling: 10000 +" + ), + ) + .expect("write semantic search config"); + + let plugin_path = + std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + clarion_bin() + .args(["analyze", "--config"]) + .arg(&config_path) + .arg(project_dir.path()) + .env("PATH", &plugin_path) + .env("TEST_EMBEDDING_KEY", "test-key") + .assert() + .success(); + + let requests = embedding_server.join().expect("embedding mock thread"); + assert_eq!( + requests.len(), + 1, + "analyze should call the embedding provider" + ); + assert!( + requests[0].contains("Launches service"), + "embedding text should include plugin docstring; request was {}", + requests[0] + ); + + let conn = Connection::open(project_dir.path().join(".clarion/clarion.db")).unwrap(); + let tag_count: i64 = conn + .query_row( + "SELECT COUNT(*) FROM entity_tags \ + WHERE entity_id = 'catfixture:function:app.main' \ + AND plugin_id = 'catfixture' \ + AND tag = 'entry-point'", + [], + |row| row.get(0), + ) + .expect("query persisted tags"); + assert_eq!(tag_count, 1, "plugin-emitted tags must be persisted"); + + let sidecar = project_dir.path().join(".clarion/embeddings.db"); + assert!(sidecar.exists(), "analyze should create embeddings sidecar"); + let sidecar_conn = Connection::open(sidecar).unwrap(); + let embedding_count: i64 = sidecar_conn + .query_row( + "SELECT COUNT(*) FROM entity_embeddings \ + WHERE entity_id = 'catfixture:function:app.main' \ + AND model_id = 'test-embed'", + [], + |row| row.get(0), + ) + .expect("query sidecar embeddings"); + assert_eq!( + embedding_count, 1, + "function embedding should be present after analyze" + ); +} + #[test] fn analyze_without_plugins_writes_skipped_run_row() { let dir = tempfile::tempdir().unwrap(); @@ -608,7 +934,11 @@ fn analyze_migrates_a_stale_db_instead_of_failing() { let uv: i64 = conn .query_row("PRAGMA user_version", [], |r| r.get(0)) .unwrap(); - assert_eq!(uv, 7, "analyze must apply the pending migration"); + assert_eq!( + uv, + i64::from(clarion_storage::schema::CURRENT_SCHEMA_VERSION), + "analyze must apply all pending migrations" + ); let has_column: i64 = conn .query_row( "SELECT COUNT(*) FROM pragma_table_info('runs') WHERE name = 'analyzed_at_commit'", @@ -1017,182 +1347,864 @@ fn analyze_emits_post_commit_deletion_finding_to_filigree() { ); } -/// clarion-ef8f64d5fd (tier half): post-`CommitRun` tier findings reach Filigree -/// in the same run too. They anchor to a synthetic subsystem entity with no -/// `source_file_path`, so the Phase-8c pass posts them against the project-root -/// fallback path (mirroring the `core:project:*` anchor) and flags them -/// `synthetic_anchor`. Run 1 builds the subsystems; tiers are seeded between runs -/// (analyze never writes them — the enrich-only axiom); run 2 (emit enabled) -/// computes the tier finding post-commit and Phase 8c POSTs it. +/// clarion-ef8f64d5fd (tier half): post-`CommitRun` tier findings reach Filigree +/// in the same run too. They anchor to a synthetic subsystem entity with no +/// `source_file_path`, so the Phase-8c pass posts them against the project-root +/// fallback path (mirroring the `core:project:*` anchor) and flags them +/// `synthetic_anchor`. Run 1 builds the subsystems; tiers are seeded between runs +/// (analyze never writes them — the enrich-only axiom); run 2 (emit enabled) +/// computes the tier finding post-commit and Phase 8c POSTs it. +#[cfg(unix)] +#[test] +fn analyze_emits_post_commit_tier_finding_to_filigree_at_project_anchor() { + let (project_dir, plugin_dir, config_path) = + phase3_project_for_rerun(&["auth_a", "auth_b", "billing_a", "billing_b"]); + let plugin_path = + std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + + { + let conn = Connection::open(project_dir.path().join(".clarion/clarion.db")).unwrap(); + // Two subsystems → two tier findings, both anchored to the project root. + // auth disagrees (MIXING); billing agrees (UNANIMOUS). They share + // (rule-family, path, null line) but carry subsystem-distinct messages — + // Filigree's intake is content-keyed (includes the message), so both + // persist distinctly rather than collapsing onto the shared path. + seed_wardline_tier(&conn, "phase3fixture:module:auth_a", "public"); + seed_wardline_tier(&conn, "phase3fixture:module:auth_b", "internal"); + seed_wardline_tier(&conn, "phase3fixture:module:billing_a", "trusted"); + seed_wardline_tier(&conn, "phase3fixture:module:billing_b", "trusted"); + } + + let (base_url, server) = spawn_capturing_filigree_mock("CLA-FACT-SUBSYSTEM-TIER-UNANIMOUS"); + + std::fs::write(&config_path, phase3_config_with_filigree(2, &base_url)) + .expect("rewrite config with filigree emission enabled"); + run_phase3_analyze( + project_dir.path(), + std::path::Path::new(&config_path), + &plugin_path, + ); + + let requests = server.join().expect("mock server thread"); + let posted = requests + .iter() + .find(|r| r.contains("CLA-FACT-SUBSYSTEM-TIER-UNANIMOUS")) + .unwrap_or_else(|| { + panic!( + "the post-commit tier finding must reach Filigree; captured {} POST(s): {}", + requests.len(), + requests.join("\n---\n") + ) + }); + // Both subsystems' tier findings ride the one Phase-8c batch... + assert!( + posted.contains("CLA-FACT-TIER-SUBSYSTEM-MIXING") + && posted.contains("CLA-FACT-SUBSYSTEM-TIER-UNANIMOUS"), + "both tier findings reach Filigree in one batch: {posted}" + ); + // ...anchored to the project root and flagged synthetic (non-file) so a + // consumer never reads the shared path as a real location... + assert!( + posted.contains("\"synthetic_anchor\":true"), + "tier findings are flagged as synthetic anchors: {posted}" + ); + assert!( + posted.contains(&project_dir.path().display().to_string()), + "tier findings are anchored to the project root path: {posted}" + ); + // ...and carry subsystem-distinct messages (≥2 distinct `Subsystem ` + // anchors), which is what keeps them distinct under Filigree's content key. + let subsystem_mentions: std::collections::BTreeSet<&str> = posted + .match_indices("core:subsystem:") + .map(|(i, _)| &posted[i..(i + "core:subsystem:".len() + 8).min(posted.len())]) + .collect(); + assert!( + subsystem_mentions.len() >= 2, + "two distinct subsystem anchors keep the findings content-distinct: {subsystem_mentions:?} in {posted}" + ); +} + +/// REQ-ANALYZE-04 verification (verbatim): run analyze, delete a file, re-run; +/// assert a `CLA-FACT-ENTITY-DELETED` finding per previously-extracted entity in +/// the deleted file — and no false positives for entities still present. +#[cfg(unix)] +#[test] +fn analyze_emits_entity_deleted_finding_when_file_removed() { + let (project_dir, plugin_dir, config_path) = + phase3_project_for_rerun(&["auth_a", "auth_b", "billing_a", "billing_b"]); + let plugin_path = + std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + + std::fs::remove_file(project_dir.path().join("billing_a.p3")).expect("delete a source file"); + run_phase3_analyze( + project_dir.path(), + std::path::Path::new(&config_path), + &plugin_path, + ); + + let conn = Connection::open(project_dir.path().join(".clarion/clarion.db")).unwrap(); + // The plugin's `module` entity carries the canonical finding shape. + let (kind, severity, status): (String, String, String) = conn + .query_row( + "SELECT kind, severity, status FROM findings \ + WHERE rule_id = 'CLA-FACT-ENTITY-DELETED' \ + AND entity_id = 'phase3fixture:module:billing_a'", + [], + |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)), + ) + .expect("entity-deleted finding for the deleted module"); + assert_eq!(kind, "fact"); + assert_eq!(severity, "INFO"); + assert_eq!(status, "open"); + + // Deleting one source file orphans exactly its two previously-extracted + // entities — the core-minted `core:file:*` and the plugin `module` — and + // nothing belonging to the surviving files. + let deleted: std::collections::BTreeSet = conn + .prepare("SELECT entity_id FROM findings WHERE rule_id = 'CLA-FACT-ENTITY-DELETED'") + .unwrap() + .query_map([], |row| row.get::<_, String>(0)) + .unwrap() + .collect::>() + .unwrap(); + assert_eq!( + deleted, + std::collections::BTreeSet::from([ + "core:file:billing_a.p3".to_owned(), + "phase3fixture:module:billing_a".to_owned(), + ]), + "only the deleted file's entities should be flagged" + ); +} + +/// REQ-ANALYZE-04: a guidance sheet whose `guides` edge targets a deleted entity +/// produces `CLA-FACT-GUIDANCE-ORPHAN`, and the deleted entity's cached summaries +/// are invalidated. Both halves are injected between runs (the fixture plugin +/// emits neither guidance sheets nor summaries), then a file is deleted + re-run. +#[cfg(unix)] +#[test] +fn analyze_emits_guidance_orphan_and_invalidates_summary_cache_on_deletion() { + let (project_dir, plugin_dir, config_path) = + phase3_project_for_rerun(&["auth_a", "auth_b", "billing_a", "billing_b"]); + let plugin_path = + std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + let db_path = project_dir.path().join(".clarion/clarion.db"); + let target = "phase3fixture:module:billing_a"; + + // Inject a guidance sheet that `guides` the soon-to-be-deleted entity, plus a + // cached summary for it. Entities/edges are never pruned, so these survive the + // re-run; the deletion path must orphan the guidance and clear the summary. + { + let conn = Connection::open(&db_path).unwrap(); + conn.execute( + "INSERT INTO entities \ + (id, plugin_id, kind, name, short_name, properties, created_at, updated_at) \ + VALUES ('core:guidance:g1', 'core', 'guidance', 'g1', 'g1', '{}', \ + '2026-01-01T00:00:00Z', '2026-01-01T00:00:00Z')", + [], + ) + .unwrap(); + conn.execute( + "INSERT INTO edges (kind, from_id, to_id, confidence) \ + VALUES ('guides', 'core:guidance:g1', ?1, 'resolved')", + [target], + ) + .unwrap(); + conn.execute( + "INSERT INTO summary_cache \ + (entity_id, content_hash, prompt_template_id, model_tier, guidance_fingerprint, \ + summary_json, cost_usd, tokens_input, tokens_output, created_at, last_accessed_at, \ + caller_count, fan_out) \ + VALUES (?1, 'h', 'tmpl', 'tier', 'fp', '{}', 0.0, 0, 0, \ + '2026-01-01T00:00:00Z', '2026-01-01T00:00:00Z', 0, 0)", + [target], + ) + .unwrap(); + } + + std::fs::remove_file(project_dir.path().join("billing_a.p3")).expect("delete a source file"); + run_phase3_analyze( + project_dir.path(), + std::path::Path::new(&config_path), + &plugin_path, + ); + + let conn = Connection::open(&db_path).unwrap(); + let (rule_id, severity, anchor, related): (String, String, String, String) = conn + .query_row( + "SELECT rule_id, severity, entity_id, related_entities \ + FROM findings WHERE rule_id = 'CLA-FACT-GUIDANCE-ORPHAN'", + [], + |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?)), + ) + .expect("query guidance-orphan finding"); + assert_eq!(rule_id, "CLA-FACT-GUIDANCE-ORPHAN"); + assert_eq!(severity, "WARN"); + assert_eq!(anchor, "core:guidance:g1"); + let related: serde_json::Value = serde_json::from_str(&related).unwrap(); + assert_eq!(related, serde_json::json!([target])); + + let cached: i64 = conn + .query_row( + "SELECT COUNT(*) FROM summary_cache WHERE entity_id = ?1", + [target], + |row| row.get(0), + ) + .unwrap(); + assert_eq!( + cached, 0, + "deleted entity's summary cache must be invalidated" + ); +} + +/// T4a (WS6): a guidance sheet whose `match_rules` carries `{"type":"entity","id":X}` +/// pointing at a deleted entity also produces `CLA-FACT-GUIDANCE-ORPHAN`. When the +/// SAME deleted target is reachable via BOTH a `guides` edge and a `match_rule`, only +/// one finding is emitted for that (sheet, target) pair (idempotent run-scoped id). +#[cfg(unix)] +#[test] +fn analyze_emits_guidance_orphan_for_match_rule_entity_and_dedupes() { + let (project_dir, plugin_dir, config_path) = + phase3_project_for_rerun(&["auth_a", "auth_b", "billing_a", "billing_b"]); + let plugin_path = + std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + let db_path = project_dir.path().join(".clarion/clarion.db"); + let target = "phase3fixture:module:billing_a"; + + { + let conn = Connection::open(&db_path).unwrap(); + // g_match: orphans `target` via a match_rule {type:entity, id:target} only. + let props = serde_json::json!({ + "match_rules": [{ "type": "entity", "id": target }], + "authored_at": "2026-01-01T00:00:00.000Z", + }) + .to_string(); + conn.execute( + "INSERT INTO entities \ + (id, plugin_id, kind, name, short_name, properties, created_at, updated_at) \ + VALUES ('core:guidance:g_match', 'core', 'guidance', 'g_match', 'g_match', ?1, \ + '2026-01-01T00:00:00Z', '2026-01-01T00:00:00Z')", + [&props], + ) + .unwrap(); + // g_both: orphans `target` via a `guides` edge AND a match_rule → one finding. + conn.execute( + "INSERT INTO entities \ + (id, plugin_id, kind, name, short_name, properties, created_at, updated_at) \ + VALUES ('core:guidance:g_both', 'core', 'guidance', 'g_both', 'g_both', ?1, \ + '2026-01-01T00:00:00Z', '2026-01-01T00:00:00Z')", + [&props], + ) + .unwrap(); + conn.execute( + "INSERT INTO edges (kind, from_id, to_id, confidence) \ + VALUES ('guides', 'core:guidance:g_both', ?1, 'resolved')", + [target], + ) + .unwrap(); + } + + std::fs::remove_file(project_dir.path().join("billing_a.p3")).expect("delete a source file"); + run_phase3_analyze( + project_dir.path(), + std::path::Path::new(&config_path), + &plugin_path, + ); + + let conn = Connection::open(&db_path).unwrap(); + // g_match emits exactly one orphan finding for target. + let match_count: i64 = conn + .query_row( + "SELECT COUNT(*) FROM findings \ + WHERE rule_id = 'CLA-FACT-GUIDANCE-ORPHAN' AND entity_id = 'core:guidance:g_match'", + [], + |row| row.get(0), + ) + .unwrap(); + assert_eq!(match_count, 1, "match_rule entity orphan should emit"); + + // g_both: guides-edge + match_rule to the same target ⇒ exactly one finding. + let both_count: i64 = conn + .query_row( + "SELECT COUNT(*) FROM findings \ + WHERE rule_id = 'CLA-FACT-GUIDANCE-ORPHAN' AND entity_id = 'core:guidance:g_both'", + [], + |row| row.get(0), + ) + .unwrap(); + assert_eq!( + both_count, 1, + "guides-edge + match_rule to same target must dedupe to one finding" + ); +} + +/// T4a (WS6): `CLA-FACT-GUIDANCE-EXPIRED` fires for a sheet whose `expires` is in +/// the past, and does NOT fire for a future `expires` or a sheet with no `expires`. +/// Runs on every analyze (independent of deletions), so this re-runs with no source +/// change. Severity INFO, confidence 1.0. +#[cfg(unix)] +#[test] +fn analyze_emits_guidance_expired_for_past_expiry_only() { + let (project_dir, plugin_dir, config_path) = phase3_project_for_rerun(&["auth_a", "auth_b"]); + let plugin_path = + std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + let db_path = project_dir.path().join(".clarion/clarion.db"); + + { + let conn = Connection::open(&db_path).unwrap(); + let insert = |conn: &Connection, slug: &str, expires: Option<&str>| { + let mut props = serde_json::json!({ "authored_at": "2026-01-01T00:00:00.000Z" }); + if let Some(e) = expires { + props["expires"] = serde_json::Value::String(e.to_owned()); + } + conn.execute( + "INSERT INTO entities \ + (id, plugin_id, kind, name, short_name, properties, created_at, updated_at) \ + VALUES (?1, 'core', 'guidance', ?2, ?2, ?3, \ + '2026-01-01T00:00:00Z', '2026-01-01T00:00:00Z')", + rusqlite::params![format!("core:guidance:{slug}"), slug, props.to_string()], + ) + .unwrap(); + }; + insert(&conn, "g_past", Some("2020-01-01T00:00:00.000Z")); + insert(&conn, "g_future", Some("2999-01-01T00:00:00.000Z")); + insert(&conn, "g_none", None); + } + + run_phase3_analyze( + project_dir.path(), + std::path::Path::new(&config_path), + &plugin_path, + ); + + let conn = Connection::open(&db_path).unwrap(); + let anchors: Vec = conn + .prepare("SELECT entity_id FROM findings WHERE rule_id = 'CLA-FACT-GUIDANCE-EXPIRED'") + .unwrap() + .query_map([], |row| row.get(0)) + .unwrap() + .collect::>() + .unwrap(); + assert_eq!(anchors, vec!["core:guidance:g_past".to_owned()]); + + let (severity, confidence): (String, f64) = conn + .query_row( + "SELECT severity, confidence FROM findings \ + WHERE rule_id = 'CLA-FACT-GUIDANCE-EXPIRED'", + [], + |row| Ok((row.get(0)?, row.get(1)?)), + ) + .unwrap(); + assert_eq!(severity, "INFO"); + assert!((confidence - 1.0).abs() < f64::EPSILON); +} + +/// T4a (WS6): EXPIRED fires even under `--no-sei` — the guidance-staleness pass is +/// independent of the SEI mint pass (deletion detection is SEI-gated; staleness is +/// not). Guards the load-bearing placement decision. +#[cfg(unix)] +#[test] +fn analyze_emits_guidance_expired_under_no_sei() { + let (project_dir, plugin_dir, config_path) = phase3_project_for_rerun(&["auth_a", "auth_b"]); + let plugin_path = + std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + let db_path = project_dir.path().join(".clarion/clarion.db"); + + { + let conn = Connection::open(&db_path).unwrap(); + conn.execute( + "INSERT INTO entities \ + (id, plugin_id, kind, name, short_name, properties, created_at, updated_at) \ + VALUES ('core:guidance:g_past', 'core', 'guidance', 'g_past', 'g_past', ?1, \ + '2026-01-01T00:00:00Z', '2026-01-01T00:00:00Z')", + [&serde_json::json!({ + "authored_at": "2026-01-01T00:00:00.000Z", + "expires": "2020-01-01T00:00:00.000Z", + }) + .to_string()], + ) + .unwrap(); + } + + clarion_bin() + .args(["analyze", "--config"]) + .arg(std::path::Path::new(&config_path)) + .arg("--no-sei") + .arg(project_dir.path()) + .env("PATH", &plugin_path) + .assert() + .success(); + + let conn = Connection::open(&db_path).unwrap(); + let count: i64 = conn + .query_row( + "SELECT COUNT(*) FROM findings WHERE rule_id = 'CLA-FACT-GUIDANCE-EXPIRED'", + [], + |row| row.get(0), + ) + .unwrap(); + assert_eq!(count, 1, "EXPIRED must fire under --no-sei"); +} + +/// T4a (WS6): `CLA-FACT-GUIDANCE-CHURN-STALE` asymmetric threshold. A pinned sheet +/// matching entities whose aggregate `git_churn_count` is in [20, 49] fires; an +/// identical non-pinned sheet at the same churn does not. Below 20 neither fires; +/// at/above 50 both fire. With churn unpopulated (production), nothing fires. +#[cfg(unix)] +#[test] +fn analyze_emits_guidance_churn_stale_with_asymmetric_pinned_threshold() { + let (project_dir, plugin_dir, config_path) = phase3_project_for_rerun(&["auth_a", "auth_b"]); + let plugin_path = + std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + let db_path = project_dir.path().join(".clarion/clarion.db"); + + // Seed git_churn_count on the matched module via properties JSON (the analyze + // pipeline does not populate it). A `kind:module` match_rule selects both + // auth modules; we control the aggregate by choosing the per-module value. + let seed_run = |churn_each: i64, pinned: bool, slug: &str| { + let conn = Connection::open(&db_path).unwrap(); + // Set churn on auth_a + auth_b (both kind=module) via properties merge. + for stem in ["auth_a", "auth_b"] { + conn.execute( + "UPDATE entities SET properties = json_set(properties, '$.git_churn_count', ?2) \ + WHERE id = ?1", + rusqlite::params![format!("phase3fixture:module:{stem}"), churn_each], + ) + .unwrap(); + } + let props = serde_json::json!({ + "match_rules": [{ "type": "kind", "value": "module" }], + "authored_at": "2026-01-01T00:00:00.000Z", + "pinned": pinned, + }) + .to_string(); + conn.execute( + "INSERT OR REPLACE INTO entities \ + (id, plugin_id, kind, name, short_name, properties, created_at, updated_at) \ + VALUES (?1, 'core', 'guidance', ?2, ?2, ?3, \ + '2026-01-01T00:00:00Z', '2026-01-01T00:00:00Z')", + rusqlite::params![format!("core:guidance:{slug}"), slug, props], + ) + .unwrap(); + drop(conn); + run_phase3_analyze( + project_dir.path(), + std::path::Path::new(&config_path), + &plugin_path, + ); + let conn = Connection::open(&db_path).unwrap(); + let fired: i64 = conn + .query_row( + "SELECT COUNT(*) FROM findings \ + WHERE rule_id = 'CLA-FACT-GUIDANCE-CHURN-STALE' AND entity_id = ?1", + rusqlite::params![format!("core:guidance:{slug}")], + |row| row.get(0), + ) + .unwrap(); + fired > 0 + }; + + // Aggregate 30 (15 each): pinned fires (>=20), non-pinned does not (<50). + assert!(seed_run(15, true, "g_pinned_30"), "pinned@30 should fire"); + assert!( + !seed_run(15, false, "g_plain_30"), + "non-pinned@30 should NOT fire" + ); + // Aggregate 10 (5 each): neither fires (<20). + assert!( + !seed_run(5, true, "g_pinned_10"), + "pinned@10 should NOT fire" + ); + // Aggregate 60 (30 each): both fire (>=50). + assert!(seed_run(30, true, "g_pinned_60"), "pinned@60 should fire"); + assert!( + seed_run(30, false, "g_plain_60"), + "non-pinned@60 should fire" + ); + + let conn = Connection::open(&db_path).unwrap(); + let (severity, confidence): (String, f64) = conn + .query_row( + "SELECT severity, confidence FROM findings \ + WHERE rule_id = 'CLA-FACT-GUIDANCE-CHURN-STALE' LIMIT 1", + [], + |row| Ok((row.get(0)?, row.get(1)?)), + ) + .unwrap(); + assert_eq!(severity, "WARN"); + assert!((confidence - 0.7).abs() < 1e-9); +} + +/// T4a (WS6): honest-empty churn. With `git_churn_count` unpopulated (the +/// production reality — analyze never writes it), CHURN-STALE does not fire even +/// for a sheet that matches many entities. +#[cfg(unix)] +#[test] +fn analyze_guidance_churn_stale_is_honest_empty_without_churn() { + let (project_dir, plugin_dir, config_path) = phase3_project_for_rerun(&["auth_a", "auth_b"]); + let plugin_path = + std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + let db_path = project_dir.path().join(".clarion/clarion.db"); + + { + let conn = Connection::open(&db_path).unwrap(); + let props = serde_json::json!({ + "match_rules": [{ "type": "kind", "value": "module" }], + "authored_at": "2026-01-01T00:00:00.000Z", + "pinned": true, + }) + .to_string(); + conn.execute( + "INSERT INTO entities \ + (id, plugin_id, kind, name, short_name, properties, created_at, updated_at) \ + VALUES ('core:guidance:g_inert', 'core', 'guidance', 'g_inert', 'g_inert', ?1, \ + '2026-01-01T00:00:00Z', '2026-01-01T00:00:00Z')", + [&props], + ) + .unwrap(); + } + + run_phase3_analyze( + project_dir.path(), + std::path::Path::new(&config_path), + &plugin_path, + ); + + let conn = Connection::open(&db_path).unwrap(); + let count: i64 = conn + .query_row( + "SELECT COUNT(*) FROM findings WHERE rule_id = 'CLA-FACT-GUIDANCE-CHURN-STALE'", + [], + |row| row.get(0), + ) + .unwrap(); + assert_eq!(count, 0, "no churn populated ⇒ CHURN-STALE inert"); +} + +#[cfg(unix)] +fn write_wardline_manifest(project_root: &std::path::Path, tier_content: &str) { + std::fs::write( + project_root.join("wardline.yaml"), + format!( + r#"version: 1 +tiers: + integral: + paths: + - auth_a.p3 + content: "{tier_content}" +boundaries: + payment_api: + paths: + - billing_a.p3 +annotation_groups: + secrets: + paths: + - auth_b.p3 +"# + ), + ) + .expect("write wardline.yaml"); +} + +#[cfg(unix)] +fn write_real_wardline_output_fixture(project_root: &std::path::Path) { + std::fs::create_dir_all(project_root.join("src/payments")) + .expect("create Wardline overlay dir"); + std::fs::write( + project_root.join("wardline.yaml"), + r#"tiers: + - id: AUDIT_TRAIL + tier: 1 + description: "Fully audited code" + - id: EXTERNAL_RAW + tier: 4 + description: "Unvetted external input" + +module_tiers: + - path: "src/core" + default_taint: "AUDIT_TRAIL" + - path: "src/integrations" + default_taint: "EXTERNAL_RAW" +"#, + ) + .expect("write real wardline.yaml"); + std::fs::write( + project_root.join("wardline.fingerprint.json"), + r#"{ + "python_version": "3.12", + "generated_at": "2026-03-01T00:00:00Z", + "coverage": { + "annotated": 2, + "total": 3, + "ratio": 0.66, + "tier1_annotated": 1, + "tier1_total": 1 + }, + "fingerprints": [ + { + "qualified_name": "core.auth.validate_token", + "module": "core.auth", + "decorators": ["wardline.tier"], + "annotation_hash": "sha256:aaa111", + "tier_context": 1, + "artefact_class": "policy" + }, + { + "qualified_name": "integrations.handler.process", + "module": "integrations.handler", + "decorators": ["wardline.tier", "wardline.external_boundary"], + "annotation_hash": "sha256:bbb222", + "tier_context": 4, + "boundary_transition": "shape_validation", + "artefact_class": "enforcement" + } + ] +} +"#, + ) + .expect("write real wardline.fingerprint.json"); + std::fs::write( + project_root.join("wardline.exceptions.json"), + r#"{ + "exceptions": [ + { + "id": "EXC-001", + "rule": "PY-WL-001", + "taint_state": "EXTERNAL_RAW", + "location": "src/integrations/handler.py::process", + "exceptionability": "STANDARD", + "severity_at_grant": "ERROR", + "rationale": "Legacy integration pending migration", + "reviewer": "j.smith", + "expires": "2027-12-01" + } + ] +} +"#, + ) + .expect("write real wardline.exceptions.json"); + std::fs::write( + project_root.join("src/payments/wardline.overlay.yaml"), + r#"overlay_for: "src/payments" + +boundaries: + - function: "process_payment" + transition: "construction" + from_tier: 1 + to_tier: 3 + - function: "validate_receipt" + transition: "shape_validation" + from_tier: 3 + to_tier: 2 +"#, + ) + .expect("write real wardline.overlay.yaml"); +} + #[cfg(unix)] #[test] -fn analyze_emits_post_commit_tier_finding_to_filigree_at_project_anchor() { +fn analyze_generates_pinned_wardline_derived_guidance() { let (project_dir, plugin_dir, config_path) = - phase3_project_for_rerun(&["auth_a", "auth_b", "billing_a", "billing_b"]); + phase3_project_for_rerun(&["auth_a", "auth_b", "billing_a"]); let plugin_path = std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + let db_path = project_dir.path().join(".clarion/clarion.db"); + write_wardline_manifest(project_dir.path(), "Keep integral code isolated."); - { - let conn = Connection::open(project_dir.path().join(".clarion/clarion.db")).unwrap(); - // Two subsystems → two tier findings, both anchored to the project root. - // auth disagrees (MIXING); billing agrees (UNANIMOUS). They share - // (rule-family, path, null line) but carry subsystem-distinct messages — - // Filigree's intake is content-keyed (includes the message), so both - // persist distinctly rather than collapsing onto the shared path. - seed_wardline_tier(&conn, "phase3fixture:module:auth_a", "public"); - seed_wardline_tier(&conn, "phase3fixture:module:auth_b", "internal"); - seed_wardline_tier(&conn, "phase3fixture:module:billing_a", "trusted"); - seed_wardline_tier(&conn, "phase3fixture:module:billing_b", "trusted"); - } - - let (base_url, server) = spawn_capturing_filigree_mock("CLA-FACT-SUBSYSTEM-TIER-UNANIMOUS"); - - std::fs::write(&config_path, phase3_config_with_filigree(2, &base_url)) - .expect("rewrite config with filigree emission enabled"); run_phase3_analyze( project_dir.path(), std::path::Path::new(&config_path), &plugin_path, ); - let requests = server.join().expect("mock server thread"); - let posted = requests - .iter() - .find(|r| r.contains("CLA-FACT-SUBSYSTEM-TIER-UNANIMOUS")) - .unwrap_or_else(|| { - panic!( - "the post-commit tier finding must reach Filigree; captured {} POST(s): {}", - requests.len(), - requests.join("\n---\n") - ) - }); - // Both subsystems' tier findings ride the one Phase-8c batch... - assert!( - posted.contains("CLA-FACT-TIER-SUBSYSTEM-MIXING") - && posted.contains("CLA-FACT-SUBSYSTEM-TIER-UNANIMOUS"), - "both tier findings reach Filigree in one batch: {posted}" - ); - // ...anchored to the project root and flagged synthetic (non-file) so a - // consumer never reads the shared path as a real location... - assert!( - posted.contains("\"synthetic_anchor\":true"), - "tier findings are flagged as synthetic anchors: {posted}" + let conn = Connection::open(&db_path).unwrap(); + let rows: Vec<(String, serde_json::Value)> = conn + .prepare( + "SELECT id, properties FROM entities \ + WHERE kind = 'guidance' AND id LIKE 'core:guidance:wardline-%' \ + ORDER BY id", + ) + .unwrap() + .query_map([], |row| { + let id: String = row.get(0)?; + let raw: String = row.get(1)?; + Ok((id, serde_json::from_str(&raw).unwrap())) + }) + .unwrap() + .collect::>() + .unwrap(); + let ids: Vec = rows.iter().map(|(id, _)| id.clone()).collect(); + assert_eq!( + ids, + vec![ + "core:guidance:wardline-annotation-group-secrets".to_owned(), + "core:guidance:wardline-boundary-payment_api".to_owned(), + "core:guidance:wardline-tier-integral".to_owned(), + ] ); + + let tier = rows + .iter() + .find(|(id, _)| id == "core:guidance:wardline-tier-integral") + .expect("tier guidance") + .1 + .clone(); + assert_eq!(tier["provenance"], "wardline_derived"); + assert_eq!(tier["pinned"], true); + assert_eq!(tier["wardline_kind"], "tier"); + assert_eq!(tier["wardline_key"], "integral"); + assert_eq!(tier["content"], "Keep integral code isolated."); assert!( - posted.contains(&project_dir.path().display().to_string()), - "tier findings are anchored to the project root path: {posted}" + tier["wardline_manifest_hash"] + .as_str() + .unwrap() + .starts_with("blake3:") ); - // ...and carry subsystem-distinct messages (≥2 distinct `Subsystem ` - // anchors), which is what keeps them distinct under Filigree's content key. - let subsystem_mentions: std::collections::BTreeSet<&str> = posted - .match_indices("core:subsystem:") - .map(|(i, _)| &posted[i..(i + "core:subsystem:".len() + 8).min(posted.len())]) - .collect(); - assert!( - subsystem_mentions.len() >= 2, - "two distinct subsystem anchors keep the findings content-distinct: {subsystem_mentions:?} in {posted}" + assert_eq!( + tier["match_rules"][0], + serde_json::json!({"type":"path","pattern":"auth_a.p3"}) ); } -/// REQ-ANALYZE-04 verification (verbatim): run analyze, delete a file, re-run; -/// assert a `CLA-FACT-ENTITY-DELETED` finding per previously-extracted entity in -/// the deleted file — and no false positives for entities still present. #[cfg(unix)] #[test] -fn analyze_emits_entity_deleted_finding_when_file_removed() { - let (project_dir, plugin_dir, config_path) = - phase3_project_for_rerun(&["auth_a", "auth_b", "billing_a", "billing_b"]); +fn analyze_accepts_real_wardline_output_bundle() { + let (project_dir, plugin_dir, config_path) = phase3_project_for_rerun(&["seed"]); let plugin_path = std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + let db_path = project_dir.path().join(".clarion/clarion.db"); + write_real_wardline_output_fixture(project_dir.path()); - std::fs::remove_file(project_dir.path().join("billing_a.p3")).expect("delete a source file"); run_phase3_analyze( project_dir.path(), std::path::Path::new(&config_path), &plugin_path, ); - let conn = Connection::open(project_dir.path().join(".clarion/clarion.db")).unwrap(); - // The plugin's `module` entity carries the canonical finding shape. - let (kind, severity, status): (String, String, String) = conn - .query_row( - "SELECT kind, severity, status FROM findings \ - WHERE rule_id = 'CLA-FACT-ENTITY-DELETED' \ - AND entity_id = 'phase3fixture:module:billing_a'", - [], - |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)), + let conn = Connection::open(&db_path).unwrap(); + let rows: Vec<(String, serde_json::Value)> = conn + .prepare( + "SELECT id, properties FROM entities \ + WHERE kind = 'guidance' AND id LIKE 'core:guidance:wardline-%' \ + ORDER BY id", ) - .expect("entity-deleted finding for the deleted module"); - assert_eq!(kind, "fact"); - assert_eq!(severity, "INFO"); - assert_eq!(status, "open"); - - // Deleting one source file orphans exactly its two previously-extracted - // entities — the core-minted `core:file:*` and the plugin `module` — and - // nothing belonging to the surviving files. - let deleted: std::collections::BTreeSet = conn - .prepare("SELECT entity_id FROM findings WHERE rule_id = 'CLA-FACT-ENTITY-DELETED'") .unwrap() - .query_map([], |row| row.get::<_, String>(0)) + .query_map([], |row| { + let id: String = row.get(0)?; + let raw: String = row.get(1)?; + Ok((id, serde_json::from_str(&raw).unwrap())) + }) .unwrap() .collect::>() .unwrap(); + let ids: Vec = rows.iter().map(|(id, _)| id.clone()).collect(); + assert!(ids.contains(&"core:guidance:wardline-tier-src-core-AUDIT_TRAIL".to_owned())); + assert!(ids.contains(&"core:guidance:wardline-tier-src-integrations-EXTERNAL_RAW".to_owned())); + assert!( + ids.contains(&"core:guidance:wardline-boundary-src-payments-process_payment".to_owned()) + ); + assert!( + ids.contains(&"core:guidance:wardline-boundary-src-payments-validate_receipt".to_owned()) + ); + assert!(ids.contains(&"core:guidance:wardline-annotation-group-wardline.tier".to_owned())); + assert!(ids.contains( + &"core:guidance:wardline-annotation-group-wardline.external_boundary".to_owned() + )); + + let tier = rows + .iter() + .find(|(id, _)| id == "core:guidance:wardline-tier-src-core-AUDIT_TRAIL") + .expect("module-tier guidance") + .1 + .clone(); + assert_eq!(tier["provenance"], "wardline_derived"); + assert_eq!(tier["pinned"], true); + assert_eq!(tier["wardline_kind"], "tier"); + assert_eq!(tier["wardline_key"], "src/core-AUDIT_TRAIL"); assert_eq!( - deleted, - std::collections::BTreeSet::from([ - "core:file:billing_a.p3".to_owned(), - "phase3fixture:module:billing_a".to_owned(), - ]), - "only the deleted file's entities should be flagged" + tier["match_rules"][0], + serde_json::json!({"type":"path","pattern":"src/core/**"}) + ); + assert_eq!(tier["wardline_fingerprint_count"], 2); + assert_eq!(tier["wardline_exception_count"], 1); + assert!( + tier["wardline_fingerprint_hash"] + .as_str() + .unwrap() + .starts_with("blake3:") + ); + assert!( + tier["wardline_exceptions_hash"] + .as_str() + .unwrap() + .starts_with("blake3:") + ); + + let boundary = rows + .iter() + .find(|(id, _)| id == "core:guidance:wardline-boundary-src-payments-process_payment") + .expect("overlay boundary guidance") + .1 + .clone(); + assert_eq!(boundary["scope_level"], "subsystem"); + assert!( + boundary["content"] + .as_str() + .unwrap() + .contains("construction") + ); + assert_eq!( + boundary["match_rules"][0], + serde_json::json!({"type":"path","pattern":"src/payments/**"}) + ); + + let group = rows + .iter() + .find(|(id, _)| id == "core:guidance:wardline-annotation-group-wardline.tier") + .expect("fingerprint annotation-group guidance") + .1 + .clone(); + assert_eq!(group["scope_level"], "project"); + assert_eq!( + group["match_rules"][0], + serde_json::json!({"type":"wardline_group","name":"wardline.tier"}) ); } -/// REQ-ANALYZE-04: a guidance sheet whose `guides` edge targets a deleted entity -/// produces `CLA-FACT-GUIDANCE-ORPHAN`, and the deleted entity's cached summaries -/// are invalidated. Both halves are injected between runs (the fixture plugin -/// emits neither guidance sheets nor summaries), then a file is deleted + re-run. #[cfg(unix)] #[test] -fn analyze_emits_guidance_orphan_and_invalidates_summary_cache_on_deletion() { +fn analyze_preserves_wardline_override_and_emits_guidance_stale() { let (project_dir, plugin_dir, config_path) = - phase3_project_for_rerun(&["auth_a", "auth_b", "billing_a", "billing_b"]); + phase3_project_for_rerun(&["auth_a", "auth_b", "billing_a"]); let plugin_path = std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); let db_path = project_dir.path().join(".clarion/clarion.db"); - let target = "phase3fixture:module:billing_a"; + write_wardline_manifest(project_dir.path(), "Initial Wardline guidance."); + run_phase3_analyze( + project_dir.path(), + std::path::Path::new(&config_path), + &plugin_path, + ); - // Inject a guidance sheet that `guides` the soon-to-be-deleted entity, plus a - // cached summary for it. Entities/edges are never pruned, so these survive the - // re-run; the deletion path must orphan the guidance and clear the summary. { let conn = Connection::open(&db_path).unwrap(); + let raw: String = conn + .query_row( + "SELECT properties FROM entities WHERE id = 'core:guidance:wardline-tier-integral'", + [], + |row| row.get(0), + ) + .unwrap(); + let mut props: serde_json::Value = serde_json::from_str(&raw).unwrap(); + props["content"] = serde_json::Value::String("Operator override text.".to_owned()); conn.execute( - "INSERT INTO entities \ - (id, plugin_id, kind, name, short_name, properties, created_at, updated_at) \ - VALUES ('core:guidance:g1', 'core', 'guidance', 'g1', 'g1', '{}', \ - '2026-01-01T00:00:00Z', '2026-01-01T00:00:00Z')", - [], - ) - .unwrap(); - conn.execute( - "INSERT INTO edges (kind, from_id, to_id, confidence) \ - VALUES ('guides', 'core:guidance:g1', ?1, 'resolved')", - [target], - ) - .unwrap(); - conn.execute( - "INSERT INTO summary_cache \ - (entity_id, content_hash, prompt_template_id, model_tier, guidance_fingerprint, \ - summary_json, cost_usd, tokens_input, tokens_output, created_at, last_accessed_at, \ - caller_count, fan_out) \ - VALUES (?1, 'h', 'tmpl', 'tier', 'fp', '{}', 0.0, 0, 0, \ - '2026-01-01T00:00:00Z', '2026-01-01T00:00:00Z', 0, 0)", - [target], + "UPDATE entities SET properties = ?1 WHERE id = 'core:guidance:wardline-tier-integral'", + [props.to_string()], ) .unwrap(); } - std::fs::remove_file(project_dir.path().join("billing_a.p3")).expect("delete a source file"); + write_wardline_manifest(project_dir.path(), "Updated Wardline guidance."); run_phase3_analyze( project_dir.path(), std::path::Path::new(&config_path), @@ -1200,30 +2212,44 @@ fn analyze_emits_guidance_orphan_and_invalidates_summary_cache_on_deletion() { ); let conn = Connection::open(&db_path).unwrap(); - let (rule_id, severity, anchor, related): (String, String, String, String) = conn + let raw: String = conn .query_row( - "SELECT rule_id, severity, entity_id, related_entities \ - FROM findings WHERE rule_id = 'CLA-FACT-GUIDANCE-ORPHAN'", + "SELECT properties FROM entities WHERE id = 'core:guidance:wardline-tier-integral'", [], - |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?)), + |row| row.get(0), ) - .expect("query guidance-orphan finding"); - assert_eq!(rule_id, "CLA-FACT-GUIDANCE-ORPHAN"); - assert_eq!(severity, "WARN"); - assert_eq!(anchor, "core:guidance:g1"); - let related: serde_json::Value = serde_json::from_str(&related).unwrap(); - assert_eq!(related, serde_json::json!([target])); + .unwrap(); + let props: serde_json::Value = serde_json::from_str(&raw).unwrap(); + assert_eq!(props["content"], "Operator override text."); + assert_eq!(props["provenance"], "wardline_derived_overridden"); - let cached: i64 = conn + let (severity, confidence, evidence): (String, f64, String) = conn .query_row( - "SELECT COUNT(*) FROM summary_cache WHERE entity_id = ?1", - [target], - |row| row.get(0), + "SELECT severity, confidence, evidence FROM findings \ + WHERE rule_id = 'CLA-FACT-GUIDANCE-STALE' \ + AND entity_id = 'core:guidance:wardline-tier-integral'", + [], + |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)), ) - .unwrap(); + .expect("guidance-stale finding"); + assert_eq!(severity, "WARN"); + assert!((confidence - 1.0).abs() < f64::EPSILON); + let evidence: serde_json::Value = serde_json::from_str(&evidence).unwrap(); assert_eq!( - cached, 0, - "deleted entity's summary cache must be invalidated" + evidence["guidance_id"], + "core:guidance:wardline-tier-integral" + ); + assert!( + evidence["stored_manifest_hash"] + .as_str() + .unwrap() + .starts_with("blake3:") + ); + assert!( + evidence["current_manifest_hash"] + .as_str() + .unwrap() + .starts_with("blake3:") ); } @@ -2718,6 +3744,78 @@ fn phase3_env() -> (tempfile::TempDir, tempfile::TempDir, std::ffi::OsString) { (project_dir, plugin_dir, plugin_path) } +#[cfg(unix)] +fn run_git(project_root: &std::path::Path, args: &[&str]) { + let status = std::process::Command::new("git") + .arg("-C") + .arg(project_root) + .args(args) + .status() + .expect("run git"); + assert!(status.success(), "git {args:?} failed with {status}"); +} + +#[cfg(unix)] +fn git_stdout(project_root: &std::path::Path, args: &[&str]) -> String { + let output = std::process::Command::new("git") + .arg("-C") + .arg(project_root) + .args(args) + .output() + .expect("run git"); + assert!(output.status.success(), "git {args:?} failed"); + String::from_utf8(output.stdout) + .expect("git stdout is utf8") + .trim() + .to_owned() +} + +#[test] +#[cfg_attr(not(unix), ignore = "fixture plugin script is a unix shebang")] +fn analyze_stamps_entities_with_git_head_commit() { + let (project_dir, _plugin_dir, plugin_path) = phase3_env(); + let mut analyze_paths: Vec = std::env::split_paths(&plugin_path).collect(); + if let Some(system_path) = std::env::var_os("PATH") { + analyze_paths.extend(std::env::split_paths(&system_path)); + } + let analyze_path = std::env::join_paths(analyze_paths).expect("join analyze PATH"); + std::fs::write(project_dir.path().join("demo.p3"), b"module\n").expect("write fixture file"); + run_git(project_dir.path(), &["init", "-q"]); + run_git(project_dir.path(), &["config", "user.email", "t@t"]); + run_git(project_dir.path(), &["config", "user.name", "t"]); + run_git(project_dir.path(), &["add", "demo.p3"]); + run_git(project_dir.path(), &["commit", "-qm", "initial"]); + let head = git_stdout(project_dir.path(), &["rev-parse", "HEAD"]); + + clarion_bin() + .args(["analyze"]) + .arg(project_dir.path()) + .env("PATH", &analyze_path) + .assert() + .success(); + + let conn = Connection::open(project_dir.path().join(".clarion/clarion.db")).unwrap(); + for entity_id in ["core:file:demo.p3", "phase3fixture:module:demo"] { + let (first_seen, last_seen): (Option, Option) = conn + .query_row( + "SELECT first_seen_commit, last_seen_commit FROM entities WHERE id = ?1", + [entity_id], + |row| Ok((row.get(0)?, row.get(1)?)), + ) + .unwrap_or_else(|err| panic!("query provenance for {entity_id}: {err}")); + assert_eq!( + first_seen.as_deref(), + Some(head.as_str()), + "{entity_id} first_seen_commit" + ); + assert_eq!( + last_seen.as_deref(), + Some(head.as_str()), + "{entity_id} last_seen_commit" + ); + } +} + #[test] #[cfg_attr(not(unix), ignore = "fixture plugin script is a unix shebang")] fn analyze_incremental_skip_does_not_orphan_unchanged_file_entities() { diff --git a/crates/clarion-cli/tests/analyze_failure_modes.rs b/crates/clarion-cli/tests/analyze_failure_modes.rs index 1760db75..3acc1bf1 100644 --- a/crates/clarion-cli/tests/analyze_failure_modes.rs +++ b/crates/clarion-cli/tests/analyze_failure_modes.rs @@ -150,6 +150,243 @@ rule_id_prefix = "CLA-BOGUS-" ontology_version = "0.6.0" "#; +/// Fixture plugin that successfully emits one module, then exits without +/// replying to the next `analyze_file` request. The test below pins the H5 +/// contract: already completed file output is durable even when a later file +/// crashes the plugin. +const PARTIAL_CRASH_PLUGIN_SCRIPT: &str = r#"#!/usr/bin/python3 +import json +import pathlib +import sys + + +seen_files = 0 + + +def read_frame(): + headers = {} + while True: + line = sys.stdin.buffer.readline() + if line in (b"", b"\r\n"): + break + name, value = line.decode("ascii").strip().split(":", 1) + headers[name.lower()] = value.strip() + length = int(headers["content-length"]) + return json.loads(sys.stdin.buffer.read(length)) + + +def write_frame(message): + body = json.dumps(message, separators=(",", ":")).encode("utf-8") + sys.stdout.buffer.write(b"Content-Length: " + str(len(body)).encode("ascii") + b"\r\n\r\n") + sys.stdout.buffer.write(body) + sys.stdout.buffer.flush() + + +while True: + msg = read_frame() + method = msg.get("method") + if method == "initialized": + continue + if method == "exit": + raise SystemExit(0) + ident = msg["id"] + if method == "initialize": + write_frame({ + "jsonrpc": "2.0", + "id": ident, + "result": { + "name": "clarion-plugin-partial", + "version": "0.1.0", + "ontology_version": "0.6.0", + "capabilities": {}, + }, + }) + elif method == "analyze_file": + seen_files += 1 + if seen_files > 1: + raise SystemExit(7) + path = msg["params"]["file_path"] + stem = pathlib.Path(path).stem + module_id = f"partialfixture:module:{stem}" + write_frame({ + "jsonrpc": "2.0", + "id": ident, + "result": { + "entities": [ + { + "id": module_id, + "kind": "module", + "qualified_name": stem, + "source": {"file_path": path}, + }, + ], + "edges": [], + "stats": {}, + }, + }) + elif method == "shutdown": + write_frame({"jsonrpc": "2.0", "id": ident, "result": {}}) + else: + raise SystemExit(1) +"#; + +const PARTIAL_CRASH_PLUGIN_MANIFEST: &str = r#" +[plugin] +name = "clarion-plugin-partial" +plugin_id = "partialfixture" +version = "0.1.0" +protocol_version = "1.0" +executable = "clarion-plugin-partial" +language = "partialfixture" +extensions = ["part"] + +[capabilities.runtime] +expected_max_rss_mb = 256 +expected_entities_per_file = 100 +wardline_aware = false +reads_outside_project_root = false + +[ontology] +entity_kinds = ["module"] +edge_kinds = [] +rule_id_prefix = "CLA-PARTIAL-" +ontology_version = "0.6.0" +"#; + +/// Fixture plugin that emits a cross-file call edge before the callee entity is +/// emitted by a later file. This pins the streaming writer ordering contract: +/// file entities may be streamed immediately, but edges must wait until both +/// endpoints exist in storage. +const CROSS_FILE_EDGE_PLUGIN_SCRIPT: &str = r#"#!/usr/bin/python3 +import json +import pathlib +import sys + + +def read_frame(): + headers = {} + while True: + line = sys.stdin.buffer.readline() + if line in (b"", b"\r\n"): + break + name, value = line.decode("ascii").strip().split(":", 1) + headers[name.lower()] = value.strip() + length = int(headers["content-length"]) + return json.loads(sys.stdin.buffer.read(length)) + + +def write_frame(message): + body = json.dumps(message, separators=(",", ":")).encode("utf-8") + sys.stdout.buffer.write(b"Content-Length: " + str(len(body)).encode("ascii") + b"\r\n\r\n") + sys.stdout.buffer.write(body) + sys.stdout.buffer.flush() + + +while True: + msg = read_frame() + method = msg.get("method") + if method == "initialized": + continue + if method == "exit": + raise SystemExit(0) + ident = msg["id"] + if method == "initialize": + write_frame({ + "jsonrpc": "2.0", + "id": ident, + "result": { + "name": "clarion-plugin-cross-file", + "version": "0.1.0", + "ontology_version": "0.6.0", + "capabilities": {}, + }, + }) + elif method == "analyze_file": + path = msg["params"]["file_path"] + stem = pathlib.Path(path).stem + module_id = f"crossfixture:module:{stem}" + entities = [ + { + "id": module_id, + "kind": "module", + "qualified_name": stem, + "source": {"file_path": path}, + }, + ] + edges = [] + if stem == "00_caller": + caller_id = "crossfixture:function:00_caller.preview" + entities.append({ + "id": caller_id, + "kind": "function", + "qualified_name": "00_caller.preview", + "source": { + "file_path": path, + "line_start": 1, + "line_end": 1, + "byte_start": 0, + "byte_end": 7, + }, + }) + edges.append({ + "kind": "calls", + "from_id": caller_id, + "to_id": "crossfixture:function:99_callee.record", + "source_byte_start": 0, + "source_byte_end": 7, + "confidence": "resolved", + }) + else: + entities.append({ + "id": "crossfixture:function:99_callee.record", + "kind": "function", + "qualified_name": "99_callee.record", + "source": { + "file_path": path, + "line_start": 1, + "line_end": 1, + "byte_start": 0, + "byte_end": 6, + }, + }) + write_frame({ + "jsonrpc": "2.0", + "id": ident, + "result": { + "entities": entities, + "edges": edges, + "stats": {}, + }, + }) + elif method == "shutdown": + write_frame({"jsonrpc": "2.0", "id": ident, "result": {}}) + else: + raise SystemExit(1) +"#; + +const CROSS_FILE_EDGE_PLUGIN_MANIFEST: &str = r#" +[plugin] +name = "clarion-plugin-cross-file" +plugin_id = "crossfixture" +version = "0.1.0" +protocol_version = "1.0" +executable = "clarion-plugin-cross-file" +language = "crossfixture" +extensions = ["cross"] + +[capabilities.runtime] +expected_max_rss_mb = 256 +expected_entities_per_file = 100 +wardline_aware = false +reads_outside_project_root = false + +[ontology] +entity_kinds = ["module", "function"] +edge_kinds = ["calls"] +rule_id_prefix = "CLA-CROSS-" +ontology_version = "0.6.0" +"#; + fn write_bogus_edge_plugin(plugin_dir: &std::path::Path) { let plugin_script = plugin_dir.join("clarion-plugin-bogus"); std::fs::write(&plugin_script, BOGUS_EDGE_PLUGIN_SCRIPT) @@ -164,6 +401,92 @@ fn write_bogus_edge_plugin(plugin_dir: &std::path::Path) { .expect("write bogus edge plugin manifest"); } +fn write_partial_crash_plugin(plugin_dir: &std::path::Path) { + let plugin_script = plugin_dir.join("clarion-plugin-partial"); + std::fs::write(&plugin_script, PARTIAL_CRASH_PLUGIN_SCRIPT) + .expect("write partial crash plugin script"); + let mut perms = std::fs::metadata(&plugin_script) + .expect("stat partial crash plugin") + .permissions(); + perms.set_mode(0o755); + std::fs::set_permissions(&plugin_script, perms).expect("chmod partial crash plugin"); + + std::fs::write( + plugin_dir.join("plugin.toml"), + PARTIAL_CRASH_PLUGIN_MANIFEST, + ) + .expect("write partial crash plugin manifest"); +} + +fn write_cross_file_edge_plugin(plugin_dir: &std::path::Path) { + let plugin_script = plugin_dir.join("clarion-plugin-cross-file"); + std::fs::write(&plugin_script, CROSS_FILE_EDGE_PLUGIN_SCRIPT) + .expect("write cross-file edge plugin script"); + let mut perms = std::fs::metadata(&plugin_script) + .expect("stat cross-file edge plugin") + .permissions(); + perms.set_mode(0o755); + std::fs::set_permissions(&plugin_script, perms).expect("chmod cross-file edge plugin"); + + std::fs::write( + plugin_dir.join("plugin.toml"), + CROSS_FILE_EDGE_PLUGIN_MANIFEST, + ) + .expect("write cross-file edge plugin manifest"); +} + +#[test] +fn analyze_defers_cross_file_edges_until_target_entity_batch_arrives() { + let project_dir = tempfile::tempdir().unwrap(); + let plugin_dir = tempfile::tempdir().unwrap(); + write_cross_file_edge_plugin(plugin_dir.path()); + + clarion_bin() + .args(["install", "--path"]) + .arg(project_dir.path()) + .env("PATH", "") + .assert() + .success(); + std::fs::write(project_dir.path().join("00_caller.cross"), b"preview\n") + .expect("write caller file"); + std::fs::write(project_dir.path().join("99_callee.cross"), b"record\n") + .expect("write callee file"); + + let plugin_path = + std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + clarion_bin() + .args(["analyze"]) + .arg(project_dir.path()) + .env("PATH", &plugin_path) + .assert() + .success(); + + let conn = Connection::open(project_dir.path().join(".clarion/clarion.db")).unwrap(); + let run_status: String = conn + .query_row( + "SELECT status FROM runs ORDER BY started_at DESC LIMIT 1", + [], + |row| row.get(0), + ) + .expect("query latest run status"); + assert_eq!(run_status, "completed"); + + let cross_file_calls: i64 = conn + .query_row( + "SELECT COUNT(*) FROM edges \ + WHERE kind = 'calls' \ + AND from_id = 'crossfixture:function:00_caller.preview' \ + AND to_id = 'crossfixture:function:99_callee.record'", + [], + |row| row.get(0), + ) + .expect("query cross-file calls edge count"); + assert_eq!( + cross_file_calls, 1, + "cross-file edge should be persisted after the target entity batch arrives" + ); +} + /// Seam test for the `SoftFailed` vs `HardFailed` branch in /// `run_with_options` (analyze.rs ~lines 426-475, 519-601). /// @@ -293,3 +616,61 @@ fn analyze_promotes_run_to_hard_failed_when_writer_actor_fails_mid_run() { writer-actor failure must not tick the crash-loop breaker" ); } + +#[test] +fn analyze_persists_completed_file_batches_when_plugin_later_crashes() { + let project_dir = tempfile::tempdir().unwrap(); + let plugin_dir = tempfile::tempdir().unwrap(); + write_partial_crash_plugin(plugin_dir.path()); + + clarion_bin() + .args(["install", "--path"]) + .arg(project_dir.path()) + .env("PATH", "") + .assert() + .success(); + std::fs::write(project_dir.path().join("first.part"), b"first\n").expect("write first.part"); + std::fs::write(project_dir.path().join("second.part"), b"second\n").expect("write second.part"); + + let plugin_path = + std::env::join_paths(std::iter::once(plugin_dir.path().to_path_buf())).unwrap(); + clarion_bin() + .args(["analyze"]) + .arg(project_dir.path()) + .env("PATH", &plugin_path) + .assert() + .failure(); + + let conn = Connection::open(project_dir.path().join(".clarion/clarion.db")).unwrap(); + let (run_status, run_stats_raw): (String, String) = conn + .query_row( + "SELECT status, stats FROM runs ORDER BY started_at DESC LIMIT 1", + [], + |row| Ok((row.get(0)?, row.get(1)?)), + ) + .expect("query latest run row"); + assert_eq!(run_status, "failed"); + let stats: serde_json::Value = + serde_json::from_str(&run_stats_raw).expect("parse runs.stats JSON"); + let failure_reason = stats["failure_reason"] + .as_str() + .expect("failed plugin run should record a failure_reason"); + assert!( + failure_reason.contains("partialfixture"), + "failure_reason should identify the crashing plugin; got: {failure_reason}" + ); + + let persisted_modules: i64 = conn + .query_row( + "SELECT COUNT(*) FROM entities \ + WHERE plugin_id = 'partialfixture' \ + AND kind = 'module'", + [], + |row| row.get(0), + ) + .expect("query persisted partialfixture module count"); + assert_eq!( + persisted_modules, 1, + "the completed file's module must remain durable after the next file crashes" + ); +} diff --git a/crates/clarion-cli/tests/guidance.rs b/crates/clarion-cli/tests/guidance.rs new file mode 100644 index 00000000..10720543 --- /dev/null +++ b/crates/clarion-cli/tests/guidance.rs @@ -0,0 +1,1347 @@ +//! `clarion guidance` authoring CLI integration tests (WS6 / REQ-GUIDANCE-03). +//! +//! Drives the real binary end-to-end against a seeded `.clarion/clarion.db`: +//! create (via `--content`), show, list (incl. `--for-entity`), edit (via a +//! fake `$EDITOR`), and delete. Verifies the written `properties` JSON matches +//! the shape the MCP read path consumes. + +use assert_cmd::Command; +use rusqlite::{Connection, OptionalExtension}; +use serde_json::Value; +use std::io::{Read, Write}; +use std::net::{SocketAddr, TcpListener}; + +fn clarion_bin() -> Command { + Command::cargo_bin("clarion").expect("clarion binary") +} + +/// Seed a real `.clarion/clarion.db` with the schema and one code entity (so +/// `--for-entity` has a target to match). +fn seed_db(root: &std::path::Path) { + let clarion_dir = root.join(".clarion"); + std::fs::create_dir_all(&clarion_dir).expect("mkdir .clarion"); + let db_path = clarion_dir.join("clarion.db"); + let mut conn = Connection::open(&db_path).expect("open db"); + clarion_storage::pragma::apply_write_pragmas(&conn).expect("write pragmas"); + clarion_storage::schema::apply_migrations(&mut conn).expect("migrate"); + // A function entity under src/auth/ so path + kind rules can match it. + // `analyze` stores `source_file_path` as a *canonicalized* absolute path + // (clarion_storage::query::normalize_source_path canonicalizes both root and + // file), and `serve` / the CLI canonicalize project_root the same way. The + // file must exist on disk for canonicalize to resolve symlinks (e.g. macOS + // /tmp → /private/tmp), so create it before seeding — this makes the seeded + // path identical to what the real write path produces, so path match-rules + // are genuinely exercised through symlinked tempdirs. + let src_dir = root.join("src").join("auth"); + std::fs::create_dir_all(&src_dir).expect("mkdir src/auth"); + let src = src_dir.join("tokens.py"); + std::fs::write(&src, "def refresh(): ...\n").expect("write source file"); + let canonical_src = src.canonicalize().expect("canonicalize source file"); + conn.execute( + "INSERT INTO entities (id, plugin_id, kind, name, short_name, properties, \ + source_file_path, created_at, updated_at) VALUES \ + (?1, 'python', 'function', 'auth.tokens.refresh', 'refresh', '{}', ?2, \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))", + rusqlite::params![ + "python:function:auth.tokens.refresh", + canonical_src.to_str().unwrap() + ], + ) + .expect("seed entity"); +} + +fn properties(root: &std::path::Path, id: &str) -> Value { + let db_path = root.join(".clarion").join("clarion.db"); + let conn = Connection::open(&db_path).expect("reopen db"); + let raw: String = conn + .query_row( + "SELECT properties FROM entities WHERE id = ?1 AND kind = 'guidance'", + rusqlite::params![id], + |row| row.get(0), + ) + .expect("sheet row present"); + serde_json::from_str(&raw).expect("properties parse") +} + +/// Insert a guidance sheet directly with a fully-controlled `properties` object +/// (so `expires` / `authored_at` / `reviewed_at` can be pinned to fixed instants +/// for the `--expired` / `--stale` filter tests). Bypasses the CLI `create` path +/// deliberately — these tests exercise `list`, not authoring. +fn seed_sheet(root: &std::path::Path, slug: &str, properties: &Value) { + let db_path = root.join(".clarion").join("clarion.db"); + let conn = Connection::open(&db_path).expect("open db for seed_sheet"); + let id = format!("core:guidance:{slug}"); + conn.execute( + "INSERT INTO entities (id, plugin_id, kind, name, short_name, properties, \ + created_at, updated_at) VALUES \ + (?1, 'core', 'guidance', ?2, ?2, ?3, \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))", + rusqlite::params![id, slug, serde_json::to_string(&properties).unwrap()], + ) + .expect("seed guidance sheet"); +} + +/// Run `guidance list` with the given extra args and return stdout. +fn list_stdout(root: &std::path::Path, extra: &[&str]) -> String { + let assert = clarion_bin() + .args(["guidance", "list"]) + .args(["--path"]) + .arg(root) + .args(extra) + .assert() + .success(); + String::from_utf8_lossy(&assert.get_output().stdout).into_owned() +} + +fn spawn_observations_server(detail: String) -> (SocketAddr, std::thread::JoinHandle<()>) { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind observations server"); + let addr = listener.local_addr().expect("server addr"); + let handle = std::thread::spawn(move || { + let (mut stream, _) = listener.accept().expect("accept observations request"); + let mut buf = [0_u8; 4096]; + let read = stream.read(&mut buf).expect("read observations request"); + let request = String::from_utf8_lossy(&buf[..read]); + assert!( + request.contains("GET /api/loom/observations?limit=100&offset=0 HTTP/1.1"), + "unexpected request: {request}" + ); + let body = serde_json::json!({ + "items": [{ + "observation_id": "clarion-obs-guidance", + "summary": "Clarion guidance proposal for python:function:auth.tokens.refresh", + "detail": detail, + "file_path": "src/auth/tokens.py", + "line": 1, + "priority": 2, + "actor": "clarion" + }], + "limit": 100, + "offset": 0, + "has_more": false + }) + .to_string(); + write!( + stream, + "HTTP/1.1 200 OK\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", + body.len(), + body + ) + .expect("write observations response"); + }); + (addr, handle) +} + +#[test] +fn list_expired_shows_only_past_expiry_sheets() { + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + + // Past expiry → expired; future expiry → not; no expiry → not. + seed_sheet( + dir.path(), + "past", + &serde_json::json!({ + "content": "x", "scope_level": "module", "match_rules": [], + "authored_at": "2026-01-01T00:00:00.000Z", + "expires": "2026-01-02T00:00:00.000Z", + }), + ); + seed_sheet( + dir.path(), + "future", + &serde_json::json!({ + "content": "x", "scope_level": "module", "match_rules": [], + "authored_at": "2026-01-01T00:00:00.000Z", + "expires": "2999-01-01T00:00:00.000Z", + }), + ); + seed_sheet( + dir.path(), + "noexpiry", + &serde_json::json!({ + "content": "x", "scope_level": "module", "match_rules": [], + "authored_at": "2026-01-01T00:00:00.000Z", + }), + ); + + let out = list_stdout(dir.path(), &["--expired"]); + assert!( + out.contains("core:guidance:past"), + "expired list missing past-expiry sheet: {out}" + ); + assert!( + !out.contains("core:guidance:future"), + "expired list should not include future-expiry sheet: {out}" + ); + assert!( + !out.contains("core:guidance:noexpiry"), + "expired list should not include no-expiry sheet: {out}" + ); +} + +#[test] +fn list_stale_shows_only_sheets_untouched_within_window() { + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + + // Authored long ago → stale at 90 days. + seed_sheet( + dir.path(), + "old", + &serde_json::json!({ + "content": "x", "scope_level": "module", "match_rules": [], + "authored_at": "2025-01-01T00:00:00.000Z", + }), + ); + // Old authored_at but recently reviewed → max(reviewed,authored) is fresh → + // NOT stale. The reviewed_at-wins TDD target, exercised through the binary. + seed_sheet( + dir.path(), + "reviewed", + &serde_json::json!({ + "content": "x", "scope_level": "module", "match_rules": [], + "authored_at": "2025-01-01T00:00:00.000Z", + "reviewed_at": "2999-01-01T00:00:00.000Z", + }), + ); + + let out = list_stdout(dir.path(), &["--stale", "--days", "90"]); + assert!( + out.contains("core:guidance:old"), + "stale list missing old sheet: {out}" + ); + assert!( + !out.contains("core:guidance:reviewed"), + "stale list should exclude recently-reviewed sheet: {out}" + ); +} + +#[test] +fn list_expired_and_stale_intersect() { + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + + // Expired AND stale (old authored_at, past expiry) → shown. + seed_sheet( + dir.path(), + "both", + &serde_json::json!({ + "content": "x", "scope_level": "module", "match_rules": [], + "authored_at": "2025-01-01T00:00:00.000Z", + "expires": "2025-06-01T00:00:00.000Z", + }), + ); + // Expired but fresh (recent authored_at) → excluded by --stale. + seed_sheet( + dir.path(), + "expired-fresh", + &serde_json::json!({ + "content": "x", "scope_level": "module", "match_rules": [], + "authored_at": "2999-01-01T00:00:00.000Z", + "expires": "2025-06-01T00:00:00.000Z", + }), + ); + // Stale but not expired (future expiry) → excluded by --expired. + seed_sheet( + dir.path(), + "stale-unexpired", + &serde_json::json!({ + "content": "x", "scope_level": "module", "match_rules": [], + "authored_at": "2025-01-01T00:00:00.000Z", + "expires": "2999-01-01T00:00:00.000Z", + }), + ); + + let out = list_stdout(dir.path(), &["--expired", "--stale", "--days", "90"]); + assert!( + out.contains("core:guidance:both"), + "intersection list missing expired+stale sheet: {out}" + ); + assert!( + !out.contains("core:guidance:expired-fresh"), + "intersection should exclude fresh sheet: {out}" + ); + assert!( + !out.contains("core:guidance:stale-unexpired"), + "intersection should exclude unexpired sheet: {out}" + ); +} + +#[test] +fn create_show_list_delete_lifecycle() { + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + + // create with explicit content + two match rules. + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--name", "auth-tokens"]) + .args(["--match", "path:src/auth/**"]) + .args(["--match", "kind:function"]) + .args(["--content", "Refresh tokens carefully."]) + .assert() + .success(); + + let id = "core:guidance:auth-tokens"; + let props = properties(dir.path(), id); + assert_eq!(props["content"], "Refresh tokens carefully."); + assert_eq!(props["scope_level"], "module"); + assert_eq!(props["provenance"], "manual"); + assert_eq!(props["pinned"], false); + assert!(props["authored_at"].is_string()); + // Match-rules in the read-path-consumed `{"type":…}` shape. + let rules = props["match_rules"].as_array().unwrap(); + assert_eq!( + rules[0], + serde_json::json!({"type":"path","pattern":"src/auth/**"}) + ); + assert_eq!( + rules[1], + serde_json::json!({"type":"kind","value":"function"}) + ); + + // show prints the id + content. + let show = clarion_bin() + .args(["guidance", "show", id]) + .args(["--path"]) + .arg(dir.path()) + .assert() + .success(); + let show_out = String::from_utf8_lossy(&show.get_output().stdout).into_owned(); + assert!(show_out.contains(id), "show missing id: {show_out}"); + assert!( + show_out.contains("Refresh tokens carefully."), + "show missing content: {show_out}" + ); + + // list (no filter) shows the sheet. + let list = clarion_bin() + .args(["guidance", "list"]) + .args(["--path"]) + .arg(dir.path()) + .assert() + .success(); + assert!( + String::from_utf8_lossy(&list.get_output().stdout).contains(id), + "list missing sheet" + ); + + // list --for-entity matches via path/kind rule. + let filtered = clarion_bin() + .args(["guidance", "list"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--for-entity", "python:function:auth.tokens.refresh"]) + .assert() + .success(); + assert!( + String::from_utf8_lossy(&filtered.get_output().stdout).contains(id), + "for-entity list should match via path/kind rule" + ); + + // delete removes it. + clarion_bin() + .args(["guidance", "delete", id]) + .args(["--path"]) + .arg(dir.path()) + .assert() + .success(); + + // show now fails (not found). + clarion_bin() + .args(["guidance", "show", id]) + .args(["--path"]) + .arg(dir.path()) + .assert() + .failure(); +} + +#[test] +fn promote_observation_creates_guidance_sheet() { + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + let proposal = clarion_storage::GuidanceProposal { + entity_id: "python:function:auth.tokens.refresh".to_owned(), + content: "Escalate auth-token risk in summaries.".to_owned(), + scope_level: "function".to_owned(), + match_rules: vec![serde_json::json!({ + "type": "entity", + "id": "python:function:auth.tokens.refresh" + })], + name: Some("auth-token-risk".to_owned()), + pinned: true, + expires: None, + }; + let detail = proposal.to_observation_detail().unwrap(); + let (addr, handle) = spawn_observations_server(detail); + std::fs::write( + dir.path().join("clarion.yaml"), + format!( + "integrations:\n filigree:\n enabled: true\n base_url: http://{addr}\n actor: clarion-test\n" + ), + ) + .unwrap(); + + let promoted = clarion_bin() + .args(["guidance", "promote", "clarion-obs-guidance"]) + .args(["--path"]) + .arg(dir.path()) + // Dismissal is best-effort and uses the Filigree MCP subprocess; point it + // at a fast no-op so this test only owns the observation-read contract. + .env("CLARION_FILIGREE_MCP_COMMAND", "/bin/true") + .assert() + .success(); + let out = String::from_utf8_lossy(&promoted.get_output().stdout); + assert!( + out.contains("Promoted observation clarion-obs-guidance to core:guidance:auth-token-risk"), + "unexpected promote output: {out}" + ); + handle.join().expect("observations server"); + + let props = properties(dir.path(), "core:guidance:auth-token-risk"); + assert_eq!(props["content"], "Escalate auth-token risk in summaries."); + assert_eq!(props["scope_level"], "function"); + assert_eq!(props["provenance"], "filigree_promotion"); + assert_eq!(props["pinned"], true); + assert_eq!( + props["match_rules"][0], + serde_json::json!({"type":"entity","id":"python:function:auth.tokens.refresh"}) + ); +} + +#[test] +fn list_for_entity_matches_via_path_rule_only() { + // A path-only sheet (no kind rule to mask it) must match the seeded entity + // through the canonicalized project_root / source_file_path symmetry — this + // is the case that silently degrades if the CLI's path treatment diverges + // from what `analyze` writes and `serve` reads. + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--name", "path-only"]) + .args(["--match", "path:src/auth/**"]) + .args(["--content", "auth guidance"]) + .assert() + .success(); + + let matched = clarion_bin() + .args(["guidance", "list"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--for-entity", "python:function:auth.tokens.refresh"]) + .assert() + .success(); + assert!( + String::from_utf8_lossy(&matched.get_output().stdout).contains("core:guidance:path-only"), + "path-only sheet should match via path rule (root/source canonicalization symmetry)" + ); + + // A non-matching path must NOT list for this entity. + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--name", "other-path"]) + .args(["--match", "path:src/billing/**"]) + .args(["--content", "billing guidance"]) + .assert() + .success(); + let filtered = clarion_bin() + .args(["guidance", "list"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--for-entity", "python:function:auth.tokens.refresh"]) + .assert() + .success(); + let out = String::from_utf8_lossy(&filtered.get_output().stdout); + assert!( + out.contains("core:guidance:path-only"), + "auth path still matches" + ); + assert!( + !out.contains("core:guidance:other-path"), + "billing path must not match the auth entity" + ); +} + +#[test] +fn create_rejects_duplicate_id() { + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + let make = || { + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "project"]) + .args(["--name", "dup"]) + .args(["--match", "kind:function"]) + .args(["--content", "x"]) + .assert() + }; + make().success(); + make().failure(); // second create on same id errors, not silent overwrite. +} + +#[test] +fn create_rejects_bad_scope_level_and_bad_match() { + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "galaxy"]) + .args(["--match", "kind:function"]) + .args(["--content", "x"]) + .assert() + .failure(); + + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--match", "bogus-no-colon"]) + .args(["--content", "x"]) + .assert() + .failure(); +} + +#[test] +fn create_normalizes_and_validates_expires() { + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + + // A bare date is accepted and normalized to start-of-day UTC in the same + // 24-char `…Z` shape the read path's lexical expiry compare expects. + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--name", "expiring"]) + .args(["--match", "kind:function"]) + .args(["--content", "x"]) + .args(["--expires", "2999-12-31"]) + .assert() + .success(); + + let props = properties(dir.path(), "core:guidance:expiring"); + let stored = props["expires"].as_str().expect("expires stored"); + assert_eq!( + stored, "2999-12-31T00:00:00.000Z", + "bare date normalized to start-of-day UTC" + ); + + // Proxy the read path: a future expiry must NOT be lexically < now, i.e. the + // sheet is not treated as already expired. + let db_path = dir.path().join(".clarion").join("clarion.db"); + let conn = Connection::open(&db_path).unwrap(); + let now: String = conn + .query_row("SELECT strftime('%Y-%m-%dT%H:%M:%fZ','now')", [], |r| { + r.get(0) + }) + .unwrap(); + assert!(stored > now.as_str(), "future expiry must sort after now"); + + // Garbage `--expires` is rejected up front (no sheet written). + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--name", "bad-expiry"]) + .args(["--match", "kind:function"]) + .args(["--content", "x"]) + .args(["--expires", "tomorrow"]) + .assert() + .failure(); +} + +#[test] +fn edit_preserves_authored_at_and_provenance_changes_only_content() { + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + let id = "core:guidance:edit-me"; + + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--name", "edit-me"]) + .args(["--match", "kind:function"]) + .args(["--pinned"]) + .args(["--content", "original"]) + .assert() + .success(); + + let before = properties(dir.path(), id); + let authored_at = before["authored_at"].as_str().unwrap().to_owned(); + + // A fake editor: a shell script that overwrites the file with new content. + let editor = dir.path().join("fake-editor.sh"); + std::fs::write(&editor, "#!/bin/sh\nprintf 'rewritten content' > \"$1\"\n").unwrap(); + #[cfg(unix)] + { + use std::os::unix::fs::PermissionsExt; + let mut perms = std::fs::metadata(&editor).unwrap().permissions(); + perms.set_mode(0o755); + std::fs::set_permissions(&editor, perms).unwrap(); + } + + clarion_bin() + .args(["guidance", "edit", id]) + .args(["--path"]) + .arg(dir.path()) + .env("EDITOR", &editor) + .env_remove("VISUAL") + .assert() + .success(); + + let after = properties(dir.path(), id); + assert_eq!(after["content"], "rewritten content", "content updated"); + assert_eq!( + after["authored_at"].as_str().unwrap(), + authored_at, + "authored_at preserved across edit" + ); + assert_eq!(after["provenance"], "manual", "provenance unchanged"); + assert_eq!(after["pinned"], true, "pinned preserved"); + assert_eq!(after["scope_level"], "module", "scope_level preserved"); +} + +#[test] +fn edit_without_editor_set_fails_cleanly() { + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--name", "noeditor"]) + .args(["--match", "kind:function"]) + .args(["--content", "x"]) + .assert() + .success(); + + clarion_bin() + .args(["guidance", "edit", "core:guidance:noeditor"]) + .args(["--path"]) + .arg(dir.path()) + .env_remove("EDITOR") + .env_remove("VISUAL") + .assert() + .failure(); +} + +/// Seed one `summary_cache` row for the given entity (the column shape +/// `analyze` and the cache writer use). +fn seed_summary_cache(root: &std::path::Path, entity_id: &str) { + let db_path = root.join(".clarion").join("clarion.db"); + let conn = Connection::open(&db_path).expect("open db"); + conn.execute( + "INSERT INTO summary_cache \ + (entity_id, content_hash, prompt_template_id, model_tier, guidance_fingerprint, \ + summary_json, cost_usd, tokens_input, tokens_output, created_at, last_accessed_at, \ + caller_count, fan_out) \ + VALUES (?1, 'h', 'tmpl', 'tier', 'fp', '{}', 0.0, 0, 0, \ + '2026-01-01T00:00:00Z', '2026-01-01T00:00:00Z', 0, 0)", + rusqlite::params![entity_id], + ) + .expect("seed summary_cache row"); +} + +fn summary_cache_count(root: &std::path::Path, entity_id: &str) -> i64 { + let db_path = root.join(".clarion").join("clarion.db"); + let conn = Connection::open(&db_path).expect("open db"); + conn.query_row( + "SELECT COUNT(*) FROM summary_cache WHERE entity_id = ?1", + rusqlite::params![entity_id], + |row| row.get(0), + ) + .expect("count cache rows") +} + +#[test] +fn create_invalidates_cached_summary_for_matched_entity() { + // ADR-007 / T-cache: authoring a sheet that matches a seeded entity drops + // that entity's cached summary, so the new guidance can reach future prompts. + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + seed_summary_cache(dir.path(), "python:function:auth.tokens.refresh"); + assert_eq!( + summary_cache_count(dir.path(), "python:function:auth.tokens.refresh"), + 1 + ); + + let assert = clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--name", "auth-sheet"]) + .args(["--match", "path:src/auth/**"]) + .args(["--content", "auth guidance"]) + .assert() + .success(); + assert!( + String::from_utf8_lossy(&assert.get_output().stdout).contains("Invalidated 1 cached"), + "create should report the invalidation" + ); + + assert_eq!( + summary_cache_count(dir.path(), "python:function:auth.tokens.refresh"), + 0, + "matched entity's cached summary must be invalidated on authoring" + ); +} + +#[test] +fn create_non_matching_sheet_leaves_cache_intact() { + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + seed_summary_cache(dir.path(), "python:function:auth.tokens.refresh"); + + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--name", "billing-sheet"]) + .args(["--match", "path:src/billing/**"]) + .args(["--content", "billing guidance"]) + .assert() + .success(); + + assert_eq!( + summary_cache_count(dir.path(), "python:function:auth.tokens.refresh"), + 1, + "a sheet that matches nothing must not touch any cache row" + ); +} + +#[test] +fn delete_invalidates_cached_summary_for_matched_entity() { + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--name", "auth-sheet"]) + .args(["--match", "kind:function"]) + .args(["--content", "auth guidance"]) + .assert() + .success(); + + // Simulate a summary cached *after* the sheet was created (e.g. the next + // briefing query); deleting the sheet must drop it so the now-removed + // guidance can't linger in a cached summary. + seed_summary_cache(dir.path(), "python:function:auth.tokens.refresh"); + assert_eq!( + summary_cache_count(dir.path(), "python:function:auth.tokens.refresh"), + 1 + ); + + let assert = clarion_bin() + .args(["guidance", "delete", "core:guidance:auth-sheet"]) + .args(["--path"]) + .arg(dir.path()) + .assert() + .success(); + assert!( + String::from_utf8_lossy(&assert.get_output().stdout).contains("Invalidated 1 cached"), + "delete should report the invalidation" + ); + + assert_eq!( + summary_cache_count(dir.path(), "python:function:auth.tokens.refresh"), + 0, + "deleting a matching sheet must invalidate the matched entity's cache" + ); +} + +#[test] +fn edit_invalidates_cached_summary_for_matched_entity() { + // An edit changes `content`, so the composed guidance for every matched + // entity changed; the cached summary must be dropped. + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); + let id = "core:guidance:edit-cache"; + + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--name", "edit-cache"]) + .args(["--match", "kind:function"]) + .args(["--content", "original"]) + .assert() + .success(); + + // Cache a summary *after* creation; the edit below must invalidate it. + seed_summary_cache(dir.path(), "python:function:auth.tokens.refresh"); + + let editor = dir.path().join("fake-editor.sh"); + std::fs::write(&editor, "#!/bin/sh\nprintf 'revised guidance' > \"$1\"\n").unwrap(); + #[cfg(unix)] + { + use std::os::unix::fs::PermissionsExt; + let mut perms = std::fs::metadata(&editor).unwrap().permissions(); + perms.set_mode(0o755); + std::fs::set_permissions(&editor, perms).unwrap(); + } + + let assert = clarion_bin() + .args(["guidance", "edit", id]) + .args(["--path"]) + .arg(dir.path()) + .env("EDITOR", &editor) + .env_remove("VISUAL") + .assert() + .success(); + assert!( + String::from_utf8_lossy(&assert.get_output().stdout).contains("Invalidated 1 cached"), + "edit should report the invalidation" + ); + + assert_eq!( + summary_cache_count(dir.path(), "python:function:auth.tokens.refresh"), + 0, + "editing a matching sheet must invalidate the matched entity's cache" + ); +} + +// ── Export / import (WS6 / T5, REQ-GUIDANCE-06) ─────────────────────────────── + +/// Run `guidance export --to `. +fn export_to(root: &std::path::Path, to_dir: &std::path::Path) { + clarion_bin() + .args(["guidance", "export"]) + .args(["--path"]) + .arg(root) + .args(["--to"]) + .arg(to_dir) + .assert() + .success(); +} + +/// Run `guidance import `. +fn import_from(root: &std::path::Path, from_dir: &std::path::Path) { + clarion_bin() + .args(["guidance", "import"]) + .args(["--path"]) + .arg(root) + .arg(from_dir) + .assert() + .success(); +} + +/// Fetch a guidance sheet's (name, properties) tuple, or None if absent. +fn sheet_fields(root: &std::path::Path, id: &str) -> Option<(String, Value)> { + let db_path = root.join(".clarion").join("clarion.db"); + let conn = Connection::open(&db_path).expect("reopen db"); + conn.query_row( + "SELECT name, properties FROM entities WHERE id = ?1 AND kind = 'guidance'", + rusqlite::params![id], + |row| { + let name: String = row.get(0)?; + let raw: String = row.get(1)?; + Ok((name, raw)) + }, + ) + .optional() + .expect("query sheet") + .map(|(name, raw)| (name, serde_json::from_str(&raw).expect("props parse"))) +} + +#[test] +fn export_import_round_trips_all_fields() { + // Headline test: seed varied sheets → export → import into a FRESH empty DB → + // every sheet equals the original field-for-field (id, name, every property + // incl. match_rules / pinned / expires / authored_at / content). + let src = tempfile::tempdir().unwrap(); + seed_db(src.path()); + + let sheets = [ + ( + "alpha", + serde_json::json!({ + "content": "Refresh tokens carefully.", + "scope_level": "module", + "match_rules": [ + { "type": "path", "pattern": "src/auth/**" }, + { "type": "kind", "value": "function" }, + ], + "pinned": true, + "provenance": "manual", + "authored_at": "2026-01-01T00:00:00.000Z", + "expires": "2027-12-31T00:00:00.000Z", + }), + ), + ( + "beta.nested.name", + serde_json::json!({ + "content": "Project-wide invariant.", + "scope_level": "project", + "match_rules": [], + "pinned": false, + "provenance": "manual", + "authored_at": "2025-06-01T12:34:56.789Z", + }), + ), + ( + "gamma", + serde_json::json!({ + "content": "multi\nline\ncontent with \"quotes\" and, commas", + "scope_level": "subsystem", + "match_rules": [ + { "type": "subsystem", "id": "core:subsystem:abcd" }, + ], + "pinned": false, + "provenance": "manual", + "authored_at": "2026-03-15T08:00:00.000Z", + "reviewed_at": "2026-04-01T09:00:00.000Z", + }), + ), + ]; + for (slug, props) in &sheets { + seed_sheet(src.path(), slug, props); + } + + let export_dir = tempfile::tempdir().unwrap(); + export_to(src.path(), export_dir.path()); + + // One file per sheet, colons sanitized. + assert!( + export_dir + .path() + .join("core__guidance__alpha.json") + .exists(), + "expected per-sheet file with sanitized name" + ); + + // Import into a fresh, empty DB. + let dst = tempfile::tempdir().unwrap(); + seed_db(dst.path()); // schema only; no guidance sheets yet. + import_from(dst.path(), export_dir.path()); + + for (slug, props) in &sheets { + let id = format!("core:guidance:{slug}"); + let (orig_name, orig_props) = + sheet_fields(src.path(), &id).expect("original sheet present"); + let (imp_name, imp_props) = sheet_fields(dst.path(), &id).expect("imported sheet present"); + assert_eq!(imp_name, orig_name, "name round-trips for {id}"); + // Field-for-field: properties equal the original (excludes created_at / + // updated_at, which are NOT stored in properties). + assert_eq!(imp_props, *props, "properties round-trip for {id}"); + assert_eq!(imp_props, orig_props, "imported == original for {id}"); + } +} + +#[test] +fn export_is_byte_deterministic() { + // Export the same DB to two dirs → byte-identical files. + let src = tempfile::tempdir().unwrap(); + seed_db(src.path()); + seed_sheet( + src.path(), + "det", + // Properties authored with keys in non-sorted order on purpose. + &serde_json::json!({ + "zeta": "z", "content": "x", "alpha": "a", + "scope_level": "module", "match_rules": [], + "authored_at": "2026-01-01T00:00:00.000Z", + }), + ); + + let a = tempfile::tempdir().unwrap(); + let b = tempfile::tempdir().unwrap(); + export_to(src.path(), a.path()); + export_to(src.path(), b.path()); + + let fname = "core__guidance__det.json"; + let bytes_a = std::fs::read(a.path().join(fname)).unwrap(); + let bytes_b = std::fs::read(b.path().join(fname)).unwrap(); + assert_eq!(bytes_a, bytes_b, "two exports must be byte-identical"); + + // Sanity: keys are actually sorted (alpha before zeta) and there is a + // trailing newline. + let text = String::from_utf8(bytes_a).unwrap(); + assert!(text.ends_with('\n'), "trailing newline: {text:?}"); + assert!( + text.find("alpha").unwrap() < text.find("zeta").unwrap(), + "keys sorted for diff-friendliness: {text}" + ); +} + +#[test] +fn import_is_idempotent() { + let src = tempfile::tempdir().unwrap(); + seed_db(src.path()); + seed_sheet( + src.path(), + "idem", + &serde_json::json!({ + "content": "stable", "scope_level": "module", "match_rules": [], + "authored_at": "2026-01-01T00:00:00.000Z", + }), + ); + let export_dir = tempfile::tempdir().unwrap(); + export_to(src.path(), export_dir.path()); + + let dst = tempfile::tempdir().unwrap(); + seed_db(dst.path()); + import_from(dst.path(), export_dir.path()); + let first = sheet_fields(dst.path(), "core:guidance:idem").expect("present after import 1"); + + // Second import of the same dir changes nothing in content. + import_from(dst.path(), export_dir.path()); + let second = sheet_fields(dst.path(), "core:guidance:idem").expect("present after import 2"); + + assert_eq!(first, second, "re-import is a content no-op"); + + // Exactly one sheet, not duplicated. + let db_path = dst.path().join(".clarion").join("clarion.db"); + let conn = Connection::open(&db_path).unwrap(); + let count: i64 = conn + .query_row( + "SELECT COUNT(*) FROM entities WHERE kind = 'guidance'", + [], + |r| r.get(0), + ) + .unwrap(); + assert_eq!(count, 1, "re-import must not duplicate sheets"); +} + +#[test] +fn import_is_additive_not_a_mirror() { + // A local sheet absent from the import dir must survive the import. + let src = tempfile::tempdir().unwrap(); + seed_db(src.path()); + seed_sheet( + src.path(), + "incoming", + &serde_json::json!({ + "content": "from-team", "scope_level": "module", "match_rules": [], + "authored_at": "2026-01-01T00:00:00.000Z", + }), + ); + let export_dir = tempfile::tempdir().unwrap(); + export_to(src.path(), export_dir.path()); + + let dst = tempfile::tempdir().unwrap(); + seed_db(dst.path()); + seed_sheet( + dst.path(), + "local-only", + &serde_json::json!({ + "content": "mine", "scope_level": "module", "match_rules": [], + "authored_at": "2026-01-01T00:00:00.000Z", + }), + ); + import_from(dst.path(), export_dir.path()); + + assert!( + sheet_fields(dst.path(), "core:guidance:incoming").is_some(), + "imported sheet present" + ); + assert!( + sheet_fields(dst.path(), "core:guidance:local-only").is_some(), + "local-only sheet must NOT be deleted by an additive import" + ); +} + +#[test] +fn import_fails_loudly_on_malformed_file() { + let dst = tempfile::tempdir().unwrap(); + seed_db(dst.path()); + let import_dir = tempfile::tempdir().unwrap(); + // A junk .json file in the import set. + std::fs::write(import_dir.path().join("broken.json"), "{ not valid json").unwrap(); + + let assert = clarion_bin() + .args(["guidance", "import"]) + .args(["--path"]) + .arg(dst.path()) + .arg(import_dir.path()) + .assert() + .failure(); + let stderr = String::from_utf8_lossy(&assert.get_output().stderr).into_owned(); + assert!( + stderr.contains("broken.json"), + "import error must name the offending file: {stderr}" + ); +} + +#[test] +fn import_rejects_code_entity_id_and_leaves_entity_intact() { + // FINDING 1(c): an import file whose JSON `id` is a CODE entity id must fail + // loudly (naming the file) and must NOT mutate the existing code entity. + let dst = tempfile::tempdir().unwrap(); + seed_db(dst.path()); // seeds python:function:auth.tokens.refresh + + let target = "python:function:auth.tokens.refresh"; + let before = sheet_props_raw(dst.path(), target).expect("code entity present"); + + let import_dir = tempfile::tempdir().unwrap(); + let evil = serde_json::json!({ + "id": target, + "name": "pwned", + "properties": { "content": "overwrite", "scope_level": "module", "match_rules": [] }, + }); + std::fs::write( + import_dir.path().join("evil.json"), + serde_json::to_string_pretty(&evil).unwrap(), + ) + .unwrap(); + + let assert = clarion_bin() + .args(["guidance", "import"]) + .args(["--path"]) + .arg(dst.path()) + .arg(import_dir.path()) + .assert() + .failure(); + let stderr = String::from_utf8_lossy(&assert.get_output().stderr).into_owned(); + assert!( + stderr.contains("evil.json"), + "import error must name the offending file: {stderr}" + ); + + let after = sheet_props_raw(dst.path(), target).expect("code entity still present"); + assert_eq!( + after, before, + "the code entity must be byte-identical after a rejected import" + ); +} + +/// Fetch the raw (name, kind, `plugin_id`, properties) tuple for ANY entity (not +/// just guidance), or None. +fn sheet_props_raw(root: &std::path::Path, id: &str) -> Option<(String, String, String, String)> { + let db_path = root.join(".clarion").join("clarion.db"); + let conn = Connection::open(&db_path).expect("reopen db"); + conn.query_row( + "SELECT name, kind, plugin_id, properties FROM entities WHERE id = ?1", + rusqlite::params![id], + |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?)), + ) + .optional() + .expect("query entity") +} + +#[test] +fn import_invalidates_union_of_old_and_new_matches() { + // FINDING 2: when import UPDATES an existing sheet whose match_rules changed, + // the OLD-matched entities' cached summaries must also be invalidated (not + // just the NEW-matched ones). kind:class → kind:function is the reliable + // discriminator (no on-disk file needed for kind rules). + let dst = tempfile::tempdir().unwrap(); + seed_db(dst.path()); // seeds a `function` entity: auth.tokens.refresh + + // Seed a `class` entity too, so an OLD `kind:class` rule has a target. + { + let db_path = dst.path().join(".clarion").join("clarion.db"); + let conn = Connection::open(&db_path).unwrap(); + conn.execute( + "INSERT INTO entities (id, plugin_id, kind, name, short_name, properties, \ + created_at, updated_at) VALUES \ + (?1, 'python', 'class', 'pkg.mod.C', 'C', '{}', \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))", + rusqlite::params!["python:class:pkg.mod.C"], + ) + .unwrap(); + } + + // Pre-existing sheet matching the CLASS entity (OLD rule: kind:class). + seed_sheet( + dst.path(), + "shifting", + &serde_json::json!({ + "content": "old", "scope_level": "module", + "match_rules": [{ "type": "kind", "value": "class" }], + "authored_at": "2026-01-01T00:00:00.000Z", + }), + ); + + // Cache rows for BOTH the old-matched (class) and new-matched (function). + seed_summary_cache(dst.path(), "python:class:pkg.mod.C"); + seed_summary_cache(dst.path(), "python:function:auth.tokens.refresh"); + + // Import a NEW version of the SAME sheet id, with match_rules flipped to + // kind:function (so the OLD class match no longer applies). + let import_dir = tempfile::tempdir().unwrap(); + let updated = serde_json::json!({ + "id": "core:guidance:shifting", + "name": "shifting", + "properties": { + "content": "new", "scope_level": "module", + "match_rules": [{ "type": "kind", "value": "function" }], + "authored_at": "2026-01-01T00:00:00.000Z", + }, + }); + std::fs::write( + import_dir.path().join("core__guidance__shifting.json"), + serde_json::to_string_pretty(&updated).unwrap(), + ) + .unwrap(); + + import_from(dst.path(), import_dir.path()); + + // BOTH cache rows must be gone: the NEW match (function) AND — the regression + // this fixes — the OLD match (class) that no longer applies. + assert_eq!( + summary_cache_count(dst.path(), "python:function:auth.tokens.refresh"), + 0, + "new-matched entity invalidated" + ); + assert_eq!( + summary_cache_count(dst.path(), "python:class:pkg.mod.C"), + 0, + "OLD-matched entity must also be invalidated on a match_rules change" + ); +} + +#[test] +fn delete_invalidates_guides_edge_target() { + // FINDING 3 (through the real delete path): a sheet that applies SOLELY via a + // `guides` edge must invalidate the guided entity's cache on delete. This is + // the FK-cascade trap: delete must invalidate BEFORE removing the sheet row, + // or the CASCADE removes the guides edge first and invalidation sees nothing. + let dir = tempfile::tempdir().unwrap(); + seed_db(dir.path()); // seeds python:function:auth.tokens.refresh + + // Author a sheet with NO match_rules (so only the guides edge can match). + clarion_bin() + .args(["guidance", "create"]) + .args(["--path"]) + .arg(dir.path()) + .args(["--scope-level", "module"]) + .args(["--name", "guides-sheet"]) + .args(["--content", "guides-edge guidance"]) + .assert() + .success(); + + // Manually wire a `guides` edge (no authoring path creates one today) and a + // cache row on the target. + { + let db_path = dir.path().join(".clarion").join("clarion.db"); + let conn = Connection::open(&db_path).unwrap(); + conn.execute( + "INSERT INTO edges (kind, from_id, to_id, confidence) VALUES \ + ('guides', ?1, ?2, 'resolved')", + rusqlite::params![ + "core:guidance:guides-sheet", + "python:function:auth.tokens.refresh" + ], + ) + .unwrap(); + } + seed_summary_cache(dir.path(), "python:function:auth.tokens.refresh"); + + let assert = clarion_bin() + .args(["guidance", "delete", "core:guidance:guides-sheet"]) + .args(["--path"]) + .arg(dir.path()) + .assert() + .success(); + assert!( + String::from_utf8_lossy(&assert.get_output().stdout).contains("Invalidated 1 cached"), + "delete should report invalidating the guides-edge target" + ); + + assert_eq!( + summary_cache_count(dir.path(), "python:function:auth.tokens.refresh"), + 0, + "guides-edge target's cache must be invalidated on delete (before FK cascade)" + ); +} + +#[test] +fn import_ignores_non_json_files() { + // A README committed alongside the sheets must not crash import. + let src = tempfile::tempdir().unwrap(); + seed_db(src.path()); + seed_sheet( + src.path(), + "ok", + &serde_json::json!({ + "content": "x", "scope_level": "module", "match_rules": [], + "authored_at": "2026-01-01T00:00:00.000Z", + }), + ); + let export_dir = tempfile::tempdir().unwrap(); + export_to(src.path(), export_dir.path()); + std::fs::write(export_dir.path().join("README.md"), "# team guidance\n").unwrap(); + + let dst = tempfile::tempdir().unwrap(); + seed_db(dst.path()); + import_from(dst.path(), export_dir.path()); + assert!( + sheet_fields(dst.path(), "core:guidance:ok").is_some(), + "the json sheet imports despite a non-json sibling" + ); +} + +#[test] +fn import_is_partial_but_safe_when_a_later_file_is_malformed() { + // Import is not atomic across the file set (each upsert is its own txn, files + // processed in sorted name order). A malformed file aborts loudly, but any + // sheet already committed before it survives — and re-import is idempotent, so + // partial progress is safe to retry. This locks that property: a good "aaa" + // sheet sorts before the bad "zzz" file, so it is committed before the abort. + let dst = tempfile::tempdir().unwrap(); + seed_db(dst.path()); + + let import_dir = tempfile::tempdir().unwrap(); + let good = serde_json::json!({ + "id": "core:guidance:aaa-good", + "name": "aaa-good", + "properties": { + "content": "valid", "scope_level": "module", "match_rules": [], + "authored_at": "2026-01-01T00:00:00.000Z", + }, + }); + std::fs::write( + import_dir.path().join("aaa-good.json"), + serde_json::to_string_pretty(&good).unwrap(), + ) + .unwrap(); + std::fs::write(import_dir.path().join("zzz-bad.json"), "{ not valid json").unwrap(); + + // The whole import fails loudly naming the bad file... + let assert = clarion_bin() + .args(["guidance", "import"]) + .args(["--path"]) + .arg(dst.path()) + .arg(import_dir.path()) + .assert() + .failure(); + let stderr = String::from_utf8_lossy(&assert.get_output().stderr).into_owned(); + assert!( + stderr.contains("zzz-bad.json"), + "names the bad file: {stderr}" + ); + + // ...but the earlier good sheet was already committed (non-atomic, idempotent + // on retry). + assert!( + sheet_fields(dst.path(), "core:guidance:aaa-good").is_some(), + "a sheet committed before the malformed file survives the abort" + ); +} diff --git a/crates/clarion-cli/tests/install.rs b/crates/clarion-cli/tests/install.rs index 66eb5688..ce8dd07c 100644 --- a/crates/clarion-cli/tests/install.rs +++ b/crates/clarion-cli/tests/install.rs @@ -63,7 +63,10 @@ fn install_applies_each_migration_exactly_once() { row.get(0) }) .unwrap(); - assert_eq!(count, 7); + assert_eq!( + count, + i64::from(clarion_storage::schema::CURRENT_SCHEMA_VERSION) + ); let versions: Vec = { let mut stmt = conn .prepare("SELECT version FROM schema_migrations ORDER BY version") @@ -71,7 +74,9 @@ fn install_applies_each_migration_exactly_once() { let rows = stmt.query_map([], |row| row.get(0)).unwrap(); rows.map(std::result::Result::unwrap).collect() }; - assert_eq!(versions, vec![1, 2, 3, 4, 5, 6, 7]); + let expected: Vec = + (1..=i64::from(clarion_storage::schema::CURRENT_SCHEMA_VERSION)).collect(); + assert_eq!(versions, expected); } #[test] diff --git a/crates/clarion-cli/tests/sarif.rs b/crates/clarion-cli/tests/sarif.rs new file mode 100644 index 00000000..e902bca2 --- /dev/null +++ b/crates/clarion-cli/tests/sarif.rs @@ -0,0 +1,144 @@ +//! `clarion sarif import` integration tests. + +use std::fs; +use std::io::{Read, Write}; +use std::net::TcpListener; + +use assert_cmd::Command; + +fn clarion_bin() -> Command { + Command::cargo_bin("clarion").expect("clarion binary") +} + +#[test] +fn sarif_import_posts_findings_to_mock_filigree() { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + + // Spawn a thread to receive the HTTP request from clarion sarif import + let handle = std::thread::spawn(move || { + let (mut stream, _) = listener.accept().expect("accept connection"); + let mut request = vec![0_u8; 8192]; + let read = stream.read(&mut request).expect("read request"); + let request_str = String::from_utf8_lossy(&request[..read]); + + assert!(request_str.contains("POST /api/v1/scan-results HTTP/1.1")); + assert!(request_str.contains("authorization: Bearer my-mock-token")); + assert!(request_str.contains("x-filigree-actor: my-actor")); + + // Verify finding content in request body + assert!( + request_str.contains("\"scan_source\":\"semgrep\""), + "body: {request_str}" + ); + assert!( + request_str.contains("\"path\":\"src/lib.rs\""), + "body: {request_str}" + ); + assert!( + request_str.contains("\"rule_id\":\"semgrep-rule-1\""), + "body: {request_str}" + ); + assert!( + request_str.contains("\"severity\":\"high\""), + "body: {request_str}" + ); + assert!( + request_str.contains("\"line_start\":42"), + "body: {request_str}" + ); + assert!( + request_str.contains("\"line_end\":45"), + "body: {request_str}" + ); + assert!( + request_str.contains("\"sarif_properties\":{\"confidence\":\"HIGH\"}"), + "body: {request_str}" + ); + assert!( + request_str.contains("\"kind\":\"defect\""), + "body: {request_str}" + ); + + let body = r#"{"files_created":1,"files_updated":0,"findings_created":1,"findings_updated":0,"new_finding_ids":["f-abc"],"observations_created":0,"observations_failed":0,"warnings":[]}"#; + write!( + stream, + "HTTP/1.1 200 OK\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", + body.len(), + body + ) + .expect("write response"); + }); + + let dir = tempfile::tempdir().unwrap(); + + // Write a mock clarion.yaml config + let config_content = format!( + r#" +integrations: + filigree: + enabled: true + base_url: "http://{addr}" + actor: "my-actor" + token_env: "TEST_FILIGREE_TOKEN" +"# + ); + fs::write(dir.path().join("clarion.yaml"), config_content).unwrap(); + + // Create a dummy .clarion dir so it passes the project layout checks + fs::create_dir_all(dir.path().join(".clarion")).unwrap(); + + // Write a mock SARIF file + let sarif_content = r#"{ + "version": "2.1.0", + "runs": [ + { + "tool": { + "driver": { + "name": "Semgrep", + "version": "1.0" + } + }, + "results": [ + { + "ruleId": "semgrep-rule-1", + "message": { + "text": "suspicious pattern detected" + }, + "level": "error", + "locations": [ + { + "physicalLocation": { + "artifactLocation": { + "uri": "file:///src/lib.rs" + }, + "region": { + "startLine": 42, + "endLine": 45 + } + } + } + ], + "properties": { + "confidence": "HIGH" + } + } + ] + } + ] + }"#; + let sarif_path = dir.path().join("semgrep.sarif"); + fs::write(&sarif_path, sarif_content).unwrap(); + + // Run the cli command + clarion_bin() + .env("TEST_FILIGREE_TOKEN", "my-mock-token") + .args(["sarif", "import"]) + .arg(&sarif_path) + .arg("--path") + .arg(dir.path()) + .assert() + .success(); + + handle.join().unwrap(); +} diff --git a/crates/clarion-cli/tests/serve.rs b/crates/clarion-cli/tests/serve.rs index 60b6d773..25b1efbc 100644 --- a/crates/clarion-cli/tests/serve.rs +++ b/crates/clarion-cli/tests/serve.rs @@ -13,6 +13,7 @@ use clarion_core::{ LEAF_SUMMARY_PROMPT_TEMPLATE_ID, plugin::{ContentLengthCeiling, Frame, read_frame, write_frame}, }; +use hmac::{Hmac, Mac}; use rusqlite::{Connection, params}; use serde_json::Value; use sha2::{Digest, Sha256}; @@ -206,6 +207,10 @@ fn serve_http_responses_match_federation_fixture_contracts() { "post-api-v1-files-resolve.batch.json", include_str!("../../../docs/federation/fixtures/post-api-v1-files-resolve.batch.json"), ); + let files_batch_fixture = load_contract_fixture( + "post-api-v1-files-batch.json", + include_str!("../../../docs/federation/fixtures/post-api-v1-files-batch.json"), + ); let dir = tempfile::tempdir().expect("temp project"); clarion_bin() .args(["install", "--path"]) @@ -231,8 +236,41 @@ fn serve_http_responses_match_federation_fixture_contracts() { &files_resolve_fixture, "post-api-v1-files-resolve.batch.json", ); + validate_fixture_examples_matching( + &bind, + &files_batch_fixture, + "post-api-v1-files-batch.json", + |example_name| example_name != "batch_unauthorized_401", + ); validate_fixture_examples(&bind, &capabilities_fixture, "get-api-v1-capabilities.json"); stop_serve(&mut child); + + let auth_dir = tempfile::tempdir().expect("temp auth project"); + clarion_bin() + .args(["install", "--path"]) + .arg(auth_dir.path()) + .env("PATH", "") + .assert() + .success(); + seed_file_entity(auth_dir.path()); + let auth_bind = free_loopback_bind(); + write_http_config_with_token_env( + auth_dir.path(), + &auth_bind, + "CLARION_TEST_FIXTURE_BATCH_TOKEN", + ); + + let mut auth_child = spawn_serve_with_env( + auth_dir.path(), + &[("CLARION_TEST_FIXTURE_BATCH_TOKEN", "fixture-secret")], + ); + validate_fixture_examples_matching( + &auth_bind, + &files_batch_fixture, + "post-api-v1-files-batch.json", + |example_name| example_name == "batch_unauthorized_401", + ); + stop_serve(&mut auth_child); } #[test] @@ -1391,8 +1429,17 @@ fn serve_http_files_endpoint_requires_hmac_identity_when_configured() { ); let path = "/api/v1/files?path=demo.py&language=python"; let missing = wait_for_http_raw_response(&bind, path, &[]); - let signed_header = hmac_component_header("shared-secret", "GET", path, b""); - let signed = wait_for_http_raw_response(&bind, path, &[("X-Loom-Component", &signed_header)]); + let (signed_header, signed_timestamp, signed_nonce) = + hmac_component_headers("shared-secret", "GET", path, b""); + let signed = wait_for_http_raw_response( + &bind, + path, + &[ + ("X-Loom-Component", &signed_header), + ("X-Loom-Timestamp", &signed_timestamp), + ("X-Loom-Nonce", &signed_nonce), + ], + ); stop_serve(&mut child); let missing = missing.expect("missing identity response"); let signed = signed.expect("signed identity response"); @@ -1428,8 +1475,17 @@ fn serve_http_files_endpoint_rejects_wrong_hmac_identity() { &[("CLARION_TEST_LOOM_IDENTITY_WRONG", "shared-secret")], ); let path = "/api/v1/files?path=demo.py&language=python"; - let wrong_header = hmac_component_header("other-secret", "GET", path, b""); - let response = wait_for_http_raw_response(&bind, path, &[("X-Loom-Component", &wrong_header)]); + let (wrong_header, wrong_timestamp, wrong_nonce) = + hmac_component_headers("other-secret", "GET", path, b""); + let response = wait_for_http_raw_response( + &bind, + path, + &[ + ("X-Loom-Component", &wrong_header), + ("X-Loom-Timestamp", &wrong_timestamp), + ("X-Loom-Nonce", &wrong_nonce), + ], + ); stop_serve(&mut child); let response = response.expect("wrong identity response"); let body: Value = serde_json::from_str(&response.body).expect("wrong identity body is JSON"); @@ -2037,6 +2093,15 @@ fn fixture_example_body<'a>(fixture: &'a Value, example_name: &str) -> &'a Value } fn validate_fixture_examples(bind: &str, fixture: &Value, fixture_name: &str) { + validate_fixture_examples_matching(bind, fixture, fixture_name, |_| true); +} + +fn validate_fixture_examples_matching( + bind: &str, + fixture: &Value, + fixture_name: &str, + should_validate: impl Fn(&str) -> bool, +) { let shapes = fixture .pointer("/shape_decl/shapes") .and_then(Value::as_object) @@ -2050,6 +2115,9 @@ fn validate_fixture_examples(bind: &str, fixture: &Value, fixture_name: &str) { .get("name") .and_then(Value::as_str) .expect("example name"); + if !should_validate(example_name) { + continue; + } let method = example .pointer("/request/method") .and_then(Value::as_str) @@ -2075,10 +2143,7 @@ fn validate_fixture_examples(bind: &str, fixture: &Value, fixture_name: &str) { panic!("{fixture_name}:{example_name} HTTP request failed: {err}") }), "POST" => { - let body = example - .pointer("/request/body") - .unwrap_or_else(|| panic!("{fixture_name}:{example_name} missing request.body")) - .to_string(); + let body = fixture_request_body(example, fixture_name, example_name); wait_for_http_post_json(bind, path, &body, &[]).unwrap_or_else(|err| { panic!("{fixture_name}:{example_name} HTTP POST request failed: {err}") }) @@ -2113,6 +2178,19 @@ fn validate_fixture_examples(bind: &str, fixture: &Value, fixture_name: &str) { } } +fn fixture_request_body(example: &Value, fixture_name: &str, example_name: &str) -> String { + let body = example + .pointer("/request/body") + .unwrap_or_else(|| panic!("{fixture_name}:{example_name} missing request.body")); + if fixture_name == "post-api-v1-files-batch.json" && example_name == "batch_too_large_400" { + let queries: Vec = (0..257) + .map(|index| serde_json::json!({"path": format!("p{index}.py"), "language": ""})) + .collect(); + return serde_json::json!({"queries": queries}).to_string(); + } + body.to_string() +} + fn assert_normative_example_fields( actual: &Value, expected: &Value, @@ -2249,6 +2327,33 @@ fn assert_value_matches_decl( "{fixture_name}:{example_name} field {field} is not an array" ); } + "array_of_strings" => { + let values = value.as_array().unwrap_or_else(|| { + panic!("{fixture_name}:{example_name} field {field} is not an array") + }); + for item in values { + assert!( + item.as_str().is_some(), + "{fixture_name}:{example_name} field {field} contains a non-string item" + ); + } + } + "array_of_resolved_items" => { + let values = value.as_array().unwrap_or_else(|| { + panic!("{fixture_name}:{example_name} field {field} is not an array") + }); + for item in values { + assert_batch_resolved_item(item, fixture_name, example_name, field); + } + } + "array_of_error_items" => { + let values = value.as_array().unwrap_or_else(|| { + panic!("{fixture_name}:{example_name} field {field} is not an array") + }); + for item in values { + assert_batch_error_item(item, fixture_name, example_name, field); + } + } "non_empty_string" => { let value = value.as_str().unwrap_or_else(|| { panic!("{fixture_name}:{example_name} field {field} is not a string") @@ -2302,6 +2407,58 @@ fn assert_value_matches_decl( } } +fn assert_batch_resolved_item(item: &Value, fixture_name: &str, example_name: &str, field: &str) { + let item = item.as_object().unwrap_or_else(|| { + panic!("{fixture_name}:{example_name} field {field} contains a non-object item") + }); + for required in [ + "requested_path", + "entity_id", + "content_hash", + "canonical_path", + "language", + ] { + let value = item.get(required).unwrap_or_else(|| { + panic!("{fixture_name}:{example_name} field {field} item missing {required}") + }); + assert!( + value.as_str().is_some_and(|value| !value.is_empty()), + "{fixture_name}:{example_name} field {field} item {required} is not a non-empty string" + ); + } + assert!( + item.keys().all(|key| [ + "requested_path", + "entity_id", + "content_hash", + "canonical_path", + "language" + ] + .contains(&key.as_str())), + "{fixture_name}:{example_name} field {field} resolved item has unexpected keys: {item:?}" + ); +} + +fn assert_batch_error_item(item: &Value, fixture_name: &str, example_name: &str, field: &str) { + let item = item.as_object().unwrap_or_else(|| { + panic!("{fixture_name}:{example_name} field {field} contains a non-object item") + }); + for required in ["requested_path", "code", "message"] { + let value = item.get(required).unwrap_or_else(|| { + panic!("{fixture_name}:{example_name} field {field} item missing {required}") + }); + assert!( + value.as_str().is_some_and(|value| !value.is_empty()), + "{fixture_name}:{example_name} field {field} item {required} is not a non-empty string" + ); + } + assert!( + item.keys() + .all(|key| ["requested_path", "code", "message"].contains(&key.as_str())), + "{fixture_name}:{example_name} field {field} error item has unexpected keys: {item:?}" + ); +} + fn seed_file_entity(project_root: &Path) -> (String, String, String) { let source_path = project_root.join("demo.py"); fs::write(&source_path, "def entry():\n return 1\n").expect("write source"); @@ -2440,47 +2597,63 @@ fn write_http_config_with_identity_token_env(project_root: &Path, bind: &str, to .expect("write HTTP serve config with identity_token_env"); } -fn hmac_component_header(secret: &str, method: &str, path_and_query: &str, body: &[u8]) -> String { +fn hmac_component_headers( + secret: &str, + method: &str, + path_and_query: &str, + body: &[u8], +) -> (String, String, String) { + let timestamp = time::OffsetDateTime::now_utc().unix_timestamp(); + let nonce = Uuid::new_v4().to_string(); + let component = hmac_component_header_with_freshness( + secret, + method, + path_and_query, + body, + timestamp, + &nonce, + ); + (component, timestamp.to_string(), nonce) +} + +fn hmac_component_header_with_freshness( + secret: &str, + method: &str, + path_and_query: &str, + body: &[u8], + timestamp: i64, + nonce: &str, +) -> String { format!( "clarion:{}", hmac_sha256_hex( secret.as_bytes(), - canonical_hmac_message(method, path_and_query, body).as_bytes() + canonical_hmac_message(method, path_and_query, body, timestamp, nonce).as_bytes() ) ) } -fn canonical_hmac_message(method: &str, path_and_query: &str, body: &[u8]) -> String { +fn canonical_hmac_message( + method: &str, + path_and_query: &str, + body: &[u8], + timestamp: i64, + nonce: &str, +) -> String { format!( - "{}\n{}\n{}", + "{}\n{}\n{}\n{}\n{}", method, path_and_query, - hex_lower(&Sha256::digest(body)) + hex_lower(&Sha256::digest(body)), + timestamp, + nonce ) } fn hmac_sha256_hex(secret: &[u8], message: &[u8]) -> String { - const BLOCK_SIZE: usize = 64; - let mut key = [0_u8; BLOCK_SIZE]; - if secret.len() > BLOCK_SIZE { - key[..32].copy_from_slice(&Sha256::digest(secret)); - } else { - key[..secret.len()].copy_from_slice(secret); - } - let mut ipad = [0x36_u8; BLOCK_SIZE]; - let mut opad = [0x5c_u8; BLOCK_SIZE]; - for index in 0..BLOCK_SIZE { - ipad[index] ^= key[index]; - opad[index] ^= key[index]; - } - let mut inner = Sha256::new(); - inner.update(ipad); - inner.update(message); - let inner = inner.finalize(); - let mut outer = Sha256::new(); - outer.update(opad); - outer.update(inner); - hex_lower(&outer.finalize()) + let mut mac = Hmac::::new_from_slice(secret).expect("HMAC accepts keys of any size"); + mac.update(message); + hex_lower(&mac.finalize().into_bytes()) } fn hex_lower(bytes: &[u8]) -> String { diff --git a/crates/clarion-cli/tests/wp1_e2e.rs b/crates/clarion-cli/tests/wp1_e2e.rs index f81e365d..9de998c2 100644 --- a/crates/clarion-cli/tests/wp1_e2e.rs +++ b/crates/clarion-cli/tests/wp1_e2e.rs @@ -52,7 +52,11 @@ fn wp1_walking_skeleton_end_to_end() { row.get(0) }) .unwrap(); - assert_eq!(migration_version, 7, "schema not on the latest migration"); + assert_eq!( + migration_version, + i64::from(clarion_storage::schema::CURRENT_SCHEMA_VERSION), + "schema not on the latest migration" + ); let runs_count: i64 = conn .query_row("SELECT COUNT(*) FROM runs", [], |row| row.get(0)) diff --git a/crates/clarion-core/Cargo.toml b/crates/clarion-core/Cargo.toml index 1d128a94..1fd3da7d 100644 --- a/crates/clarion-core/Cargo.toml +++ b/crates/clarion-core/Cargo.toml @@ -10,6 +10,7 @@ rust-version.workspace = true workspace = true [dependencies] +async-trait.workspace = true reqwest.workspace = true serde.workspace = true serde_json.workspace = true @@ -17,6 +18,7 @@ tempfile.workspace = true thiserror.workspace = true toml.workspace = true tracing.workspace = true +tokio.workspace = true nix = { workspace = true } which = { workspace = true } diff --git a/crates/clarion-core/src/embedding_provider.rs b/crates/clarion-core/src/embedding_provider.rs index df218d9d..ee4aea35 100644 --- a/crates/clarion-core/src/embedding_provider.rs +++ b/crates/clarion-core/src/embedding_provider.rs @@ -14,6 +14,7 @@ use std::sync::Mutex; use std::time::Duration; +use async_trait::async_trait; use serde::{Deserialize, Serialize}; use thiserror::Error; @@ -64,6 +65,7 @@ impl EmbeddingProviderError { /// A provider that turns text into dense float vectors. One `embed` call /// processes a batch; the returned vectors are positionally aligned with the /// input `texts` and each has length [`EmbeddingProvider::dimensions`]. +#[async_trait] pub trait EmbeddingProvider: Send + Sync { fn name(&self) -> &'static str; /// The model identifier embeddings are keyed by (cache invalidation). @@ -71,7 +73,7 @@ pub trait EmbeddingProvider: Send + Sync { /// The dimensionality every returned vector must have. fn dimensions(&self) -> usize; /// Embed a batch of texts, positionally aligned with the input. - fn embed(&self, texts: &[String]) -> Result>, EmbeddingProviderError>; + async fn embed(&self, texts: &[String]) -> Result>, EmbeddingProviderError>; /// Heuristic input-token estimate for cost governance (chars / 4). fn estimate_tokens(&self, texts: &[String]) -> u64 { texts @@ -128,6 +130,7 @@ impl RecordingEmbeddingProvider { } } +#[async_trait] impl EmbeddingProvider for RecordingEmbeddingProvider { fn name(&self) -> &'static str { "recording" @@ -141,7 +144,7 @@ impl EmbeddingProvider for RecordingEmbeddingProvider { self.dimensions } - fn embed(&self, texts: &[String]) -> Result>, EmbeddingProviderError> { + async fn embed(&self, texts: &[String]) -> Result>, EmbeddingProviderError> { let mut out = Vec::with_capacity(texts.len()); for text in texts { self.invocations @@ -238,6 +241,7 @@ impl ApiEmbeddingProvider { } } +#[async_trait] impl EmbeddingProvider for ApiEmbeddingProvider { fn name(&self) -> &'static str { "api" @@ -251,12 +255,12 @@ impl EmbeddingProvider for ApiEmbeddingProvider { self.dimensions } - fn embed(&self, texts: &[String]) -> Result>, EmbeddingProviderError> { + async fn embed(&self, texts: &[String]) -> Result>, EmbeddingProviderError> { if texts.is_empty() { return Ok(Vec::new()); } let payload = serde_json::json!({ "model": self.model_id, "input": texts }); - let client = reqwest::blocking::Client::builder() + let client = reqwest::Client::builder() .timeout(Duration::from_secs(self.timeout_seconds)) .build() .map_err(|err| EmbeddingProviderError::Http { @@ -269,6 +273,7 @@ impl EmbeddingProvider for ApiEmbeddingProvider { .header("content-type", "application/json") .json(&payload) .send() + .await .map_err(|err| EmbeddingProviderError::Http { message: err.to_string(), retryable: true, @@ -276,6 +281,7 @@ impl EmbeddingProvider for ApiEmbeddingProvider { let status = response.status(); let body = response .text() + .await .map_err(|err| EmbeddingProviderError::Http { message: err.to_string(), retryable: true, @@ -360,8 +366,8 @@ mod tests { } } - #[test] - fn recording_provider_returns_recorded_vectors_in_order() { + #[tokio::test] + async fn recording_provider_returns_recorded_vectors_in_order() { let provider = RecordingEmbeddingProvider::from_recordings( "test-model", 2, @@ -369,6 +375,7 @@ mod tests { ); let out = provider .embed(&["beta".to_owned(), "alpha".to_owned()]) + .await .expect("embed"); assert_eq!(out, vec![vec![0.0, 1.0], vec![1.0, 0.0]]); assert_eq!(provider.invocations(), vec!["beta", "alpha"]); @@ -376,11 +383,11 @@ mod tests { assert_eq!(provider.model_id(), "test-model"); } - #[test] - fn recording_provider_errors_on_missing_text() { + #[tokio::test] + async fn recording_provider_errors_on_missing_text() { let provider = RecordingEmbeddingProvider::from_recordings("m", 1, vec![rec("known", vec![1.0])]); - let err = provider.embed(&["unknown".to_owned()]).unwrap_err(); + let err = provider.embed(&["unknown".to_owned()]).await.unwrap_err(); assert!(matches!( err, EmbeddingProviderError::MissingRecording { .. } @@ -388,11 +395,11 @@ mod tests { assert!(!err.retryable()); } - #[test] - fn recording_provider_rejects_wrong_dimension() { + #[tokio::test] + async fn recording_provider_rejects_wrong_dimension() { let provider = RecordingEmbeddingProvider::from_recordings("m", 3, vec![rec("x", vec![1.0, 2.0])]); - let err = provider.embed(&["x".to_owned()]).unwrap_err(); + let err = provider.embed(&["x".to_owned()]).await.unwrap_err(); assert!(matches!( err, EmbeddingProviderError::InvalidResponse { .. } diff --git a/crates/clarion-core/src/llm_provider.rs b/crates/clarion-core/src/llm_provider.rs index 476cf8a0..0287f717 100644 --- a/crates/clarion-core/src/llm_provider.rs +++ b/crates/clarion-core/src/llm_provider.rs @@ -8,6 +8,7 @@ use std::sync::Mutex; use std::thread; use std::time::{Duration, Instant}; +use async_trait::async_trait; use serde::{Deserialize, Serialize}; use serde_json::Value; use thiserror::Error; @@ -102,9 +103,10 @@ impl LlmProviderError { } } +#[async_trait] pub trait LlmProvider: Send + Sync { fn name(&self) -> &'static str; - fn invoke(&self, request: LlmRequest) -> Result; + async fn invoke(&self, request: LlmRequest) -> Result; fn estimate_tokens(&self, request: &LlmRequest) -> u64; fn tier_to_model(&self, tier: &str) -> Option<&str>; fn caching_model(&self) -> CachingModel; @@ -164,12 +166,13 @@ impl RecordingProvider { } } +#[async_trait] impl LlmProvider for RecordingProvider { fn name(&self) -> &'static str { "recording" } - fn invoke(&self, request: LlmRequest) -> Result { + async fn invoke(&self, request: LlmRequest) -> Result { self.invocations .lock() .unwrap_or_else(std::sync::PoisonError::into_inner) @@ -249,12 +252,13 @@ impl OpenRouterProvider { } } +#[async_trait] impl LlmProvider for OpenRouterProvider { fn name(&self) -> &'static str { "openrouter" } - fn invoke(&self, request: LlmRequest) -> Result { + async fn invoke(&self, request: LlmRequest) -> Result { let payload = serde_json::json!({ "model": request.model_id, "max_tokens": request.max_output_tokens, @@ -270,7 +274,7 @@ impl LlmProvider for OpenRouterProvider { } ] }); - let client = reqwest::blocking::Client::builder() + let client = reqwest::Client::builder() .timeout(Duration::from_secs(self.timeout_seconds)) .build() .map_err(|err| LlmProviderError::Http { @@ -285,16 +289,20 @@ impl LlmProvider for OpenRouterProvider { .header("content-type", "application/json") .json(&payload) .send() + .await .map_err(|err| LlmProviderError::Http { message: err.to_string(), retryable: true, })?; let status = response.status(); let retry_after_seconds = retry_after_seconds(response.headers()); - let body = response.text().map_err(|err| LlmProviderError::Http { - message: err.to_string(), - retryable: true, - })?; + let body = response + .text() + .await + .map_err(|err| LlmProviderError::Http { + message: err.to_string(), + retryable: true, + })?; if !status.is_success() { return Err(provider_error_from_body( status.as_u16(), @@ -546,15 +554,24 @@ impl CodexCliProvider { } } +#[async_trait] impl LlmProvider for CodexCliProvider { fn name(&self) -> &'static str { "codex_cli" } - fn invoke(&self, request: LlmRequest) -> Result { - let output_file = codex_temp_file("clarion-codex-output", ".json")?; - let schema_file = codex_temp_file("clarion-codex-schema", ".json")?; - self.invoke_with_temp_files(request, output_file.path(), schema_file.path()) + async fn invoke(&self, request: LlmRequest) -> Result { + let this = self.clone(); + tokio::task::spawn_blocking(move || { + let output_file = codex_temp_file("clarion-codex-output", ".json")?; + let schema_file = codex_temp_file("clarion-codex-schema", ".json")?; + this.invoke_with_temp_files(request, output_file.path(), schema_file.path()) + }) + .await + .map_err(|err| LlmProviderError::Cli { + message: format!("Codex CLI task failed to join: {err}"), + retryable: true, + })? } fn estimate_tokens(&self, request: &LlmRequest) -> u64 { @@ -658,99 +675,109 @@ impl ClaudeCliProvider { } } +#[async_trait] impl LlmProvider for ClaudeCliProvider { fn name(&self) -> &'static str { "claude_cli" } - fn invoke(&self, request: LlmRequest) -> Result { - let schema = codex_output_schema_for_purpose(&request.purpose); - let schema_json = - serde_json::to_string(&schema).map_err(|err| LlmProviderError::InvalidResponse { - message: format!("serialize Claude output schema: {err}"), + async fn invoke(&self, request: LlmRequest) -> Result { + let this = self.clone(); + tokio::task::spawn_blocking(move || { + let schema = codex_output_schema_for_purpose(&request.purpose); + let schema_json = serde_json::to_string(&schema).map_err(|err| { + LlmProviderError::InvalidResponse { + message: format!("serialize Claude output schema: {err}"), + retryable: false, + } + })?; + let provider_prompt = build_coding_agent_provider_prompt(&request); + let mut command = Command::new(&this.executable); + command + .arg("-p") + .arg(CLAUDE_CLI_PRINT_PROMPT) + .arg("--output-format") + .arg("json") + .arg("--json-schema") + .arg(schema_json) + .arg("--permission-mode") + .arg(&this.permission_mode) + .arg("--max-turns") + .arg(this.max_turns.to_string()) + .arg("--mcp-config") + .arg(r#"{"mcpServers":{}}"#) + .arg("--strict-mcp-config") + .arg("--disable-slash-commands"); + if this.no_session_persistence { + command.arg("--no-session-persistence"); + } + if this.exclude_dynamic_system_prompt_sections { + command.arg("--exclude-dynamic-system-prompt-sections"); + } + if let Some(model) = &this.model { + command.arg("--model").arg(model); + } + command.arg("--tools").arg(this.tools.join(",")); + command + .current_dir(&this.project_root) + .stdin(Stdio::piped()) + .stdout(Stdio::piped()) + .stderr(Stdio::piped()); + + let mut child = command.spawn().map_err(|err| LlmProviderError::Cli { + message: format!("spawn Claude CLI {}: {err}", this.executable), retryable: false, })?; - let provider_prompt = build_coding_agent_provider_prompt(&request); - let mut command = Command::new(&self.executable); - command - .arg("-p") - .arg(CLAUDE_CLI_PRINT_PROMPT) - .arg("--output-format") - .arg("json") - .arg("--json-schema") - .arg(schema_json) - .arg("--permission-mode") - .arg(&self.permission_mode) - .arg("--max-turns") - .arg(self.max_turns.to_string()) - .arg("--mcp-config") - .arg(r#"{"mcpServers":{}}"#) - .arg("--strict-mcp-config") - .arg("--disable-slash-commands"); - if self.no_session_persistence { - command.arg("--no-session-persistence"); - } - if self.exclude_dynamic_system_prompt_sections { - command.arg("--exclude-dynamic-system-prompt-sections"); - } - if let Some(model) = &self.model { - command.arg("--model").arg(model); - } - command.arg("--tools").arg(self.tools.join(",")); - command - .current_dir(&self.project_root) - .stdin(Stdio::piped()) - .stdout(Stdio::piped()) - .stderr(Stdio::piped()); - - let mut child = command.spawn().map_err(|err| LlmProviderError::Cli { - message: format!("spawn Claude CLI {}: {err}", self.executable), - retryable: false, - })?; - let stdout_reader = take_reader(&mut child.stdout, "stdout")?; - let stderr_reader = take_reader(&mut child.stderr, "stderr")?; - if let Err(err) = write_child_stdin(&mut child, &provider_prompt) { - let _ = child.kill(); - return Err(err); - } - - let status = wait_for_child(&mut child, self.timeout, self.timeout_seconds)?; - let stdout = join_reader(stdout_reader, "stdout")?; - let stderr = join_reader(stderr_reader, "stderr")?; - if !status.success() { - return Err(LlmProviderError::Cli { - message: format!( - "claude -p exited with {status}: {}", - truncate_for_error(&String::from_utf8_lossy(&stderr)) - ), - retryable: cli_status_retryable(status), - }); - } + let stdout_reader = take_reader(&mut child.stdout, "stdout")?; + let stderr_reader = take_reader(&mut child.stderr, "stderr")?; + if let Err(err) = write_child_stdin(&mut child, &provider_prompt) { + let _ = child.kill(); + return Err(err); + } - let parsed = parse_claude_cli_json_output(&stdout)?; - let input_tokens = parsed - .usage - .input_tokens - .unwrap_or_else(|| estimate_text_tokens(&provider_prompt)); - let output_tokens = parsed - .usage - .output_tokens - .unwrap_or_else(|| estimate_text_tokens(&parsed.output_json)); - let total_tokens = parsed - .usage - .total_tokens - .unwrap_or_else(|| input_tokens.saturating_add(output_tokens)); - let cached_input_tokens = parsed.usage.cached_input_tokens.unwrap_or(0); + let status = wait_for_child(&mut child, this.timeout, this.timeout_seconds)?; + let stdout = join_reader(stdout_reader, "stdout")?; + let stderr = join_reader(stderr_reader, "stderr")?; + if !status.success() { + return Err(LlmProviderError::Cli { + message: format!( + "claude -p exited with {status}: {}", + truncate_for_error(&String::from_utf8_lossy(&stderr)) + ), + retryable: cli_status_retryable(status), + }); + } - Ok(LlmResponse { - model_id: request.model_id, - output_json: parsed.output_json, - input_tokens, - cached_input_tokens, - output_tokens, - total_tokens, - cost_usd: parsed.cost_usd.unwrap_or(0.0), + let parsed = parse_claude_cli_json_output(&stdout)?; + let input_tokens = parsed + .usage + .input_tokens + .unwrap_or_else(|| estimate_text_tokens(&provider_prompt)); + let output_tokens = parsed + .usage + .output_tokens + .unwrap_or_else(|| estimate_text_tokens(&parsed.output_json)); + let total_tokens = parsed + .usage + .total_tokens + .unwrap_or_else(|| input_tokens.saturating_add(output_tokens)); + let cached_input_tokens = parsed.usage.cached_input_tokens.unwrap_or(0); + + Ok(LlmResponse { + model_id: request.model_id, + output_json: parsed.output_json, + input_tokens, + cached_input_tokens, + output_tokens, + total_tokens, + cost_usd: parsed.cost_usd.unwrap_or(0.0), + }) }) + .await + .map_err(|err| LlmProviderError::Cli { + message: format!("Claude CLI task failed to join: {err}"), + retryable: true, + })? } fn estimate_tokens(&self, request: &LlmRequest) -> u64 { @@ -1365,6 +1392,7 @@ pub struct LeafSummaryPromptInput { pub entity_id: String, pub kind: String, pub name: String, + pub guidance: String, pub source_excerpt: String, } @@ -1378,6 +1406,11 @@ pub struct InferredCallsPromptInput { } pub fn build_leaf_summary_prompt(input: &LeafSummaryPromptInput) -> PromptTemplate { + let guidance = if input.guidance.trim().is_empty() { + "No matching guidance." + } else { + input.guidance.as_str() + }; PromptTemplate { id: LEAF_SUMMARY_PROMPT_TEMPLATE_ID, body: format!( @@ -1385,11 +1418,13 @@ pub fn build_leaf_summary_prompt(input: &LeafSummaryPromptInput) -> PromptTempla Entity id: {entity_id}\n\ Kind: {kind}\n\ Name: {name}\n\ + Matching guidance:\n{guidance}\n\ Source excerpt:\n{source}\n\ Return JSON with purpose, behavior, relationships, and risks fields.", entity_id = input.entity_id, kind = input.kind, name = input.name, + guidance = guidance, source = input.source_excerpt, ), } @@ -1419,8 +1454,8 @@ pub fn build_inferred_calls_prompt(input: &InferredCallsPromptInput) -> PromptTe mod tests { use super::*; - #[test] - fn recording_provider_replays_exact_request_shape() { + #[tokio::test] + async fn recording_provider_replays_exact_request_shape() { let request = LlmRequest { purpose: LlmPurpose::Summary, model_id: "anthropic/claude-sonnet-4.6".to_owned(), @@ -1442,7 +1477,7 @@ mod tests { response: response.clone(), }]); - assert_eq!(provider.invoke(request.clone()).unwrap(), response); + assert_eq!(provider.invoke(request.clone()).await.unwrap(), response); assert_eq!(provider.invocations(), vec![request.clone()]); let missing = provider @@ -1450,6 +1485,7 @@ mod tests { prompt: "changed".to_owned(), ..request }) + .await .expect_err("request-shape drift should miss the recording"); assert!(matches!(missing, LlmProviderError::MissingRecording { .. })); } @@ -1460,6 +1496,7 @@ mod tests { entity_id: "python:function:demo.hello".to_owned(), kind: "function".to_owned(), name: "demo.hello".to_owned(), + guidance: String::new(), source_excerpt: "def hello():\n return 42\n".to_owned(), }); assert_eq!(summary.id, LEAF_SUMMARY_PROMPT_TEMPLATE_ID); @@ -1566,8 +1603,8 @@ mod tests { ); } - #[test] - fn openrouter_provider_invokes_chat_completions_and_extracts_usage_tokens() { + #[tokio::test] + async fn openrouter_provider_invokes_chat_completions_and_extracts_usage_tokens() { use std::io::{Read, Write}; use std::net::TcpListener; @@ -1634,6 +1671,7 @@ mod tests { prompt: "Summarize this function".to_owned(), max_output_tokens: 512, }) + .await .expect("invoke mocked OpenRouter"); assert_eq!(response.output_json, r#"{"purpose":"demo"}"#); @@ -1644,11 +1682,12 @@ mod tests { handle.join().expect("server thread"); } - #[test] - fn openrouter_provider_unwraps_error_envelope_with_retryability() { + #[tokio::test] + async fn openrouter_provider_unwraps_error_envelope_with_retryability() { let auth_error = invoke_openrouter_once( "HTTP/1.1 401 Unauthorized\r\ncontent-type: application/json\r\nconnection: close\r\n\r\n{\"error\":{\"code\":401,\"message\":\"Invalid credentials\",\"metadata\":{}}}", ) + .await .expect_err("401 should return provider error"); assert!(matches!( auth_error, @@ -1663,6 +1702,7 @@ mod tests { let retryable = invoke_openrouter_once( "HTTP/1.1 503 Service Unavailable\r\nretry-after: 60\r\ncontent-type: application/json\r\nconnection: close\r\n\r\n{\"error\":{\"code\":503,\"message\":\"No provider available\",\"metadata\":{}}}", ) + .await .expect_err("503 should return provider error"); assert!(matches!( retryable, @@ -1675,11 +1715,12 @@ mod tests { )); } - #[test] - fn openrouter_provider_unwraps_choice_level_error() { + #[tokio::test] + async fn openrouter_provider_unwraps_choice_level_error() { let err = invoke_openrouter_once( "HTTP/1.1 200 OK\r\ncontent-type: application/json\r\nconnection: close\r\n\r\n{\"id\":\"gen-01\",\"object\":\"chat.completion\",\"created\":1779000000,\"model\":\"anthropic/claude-sonnet-4.6\",\"choices\":[{\"finish_reason\":\"error\",\"native_finish_reason\":\"error\",\"message\":{\"role\":\"assistant\",\"content\":\"\"},\"error\":{\"code\":502,\"message\":\"Provider disconnected\"}}],\"usage\":{\"prompt_tokens\":1,\"completion_tokens\":0,\"total_tokens\":1}}", ) + .await .expect_err("choice error should return provider error"); assert!(matches!( @@ -1693,8 +1734,8 @@ mod tests { assert!(err.to_string().contains("Provider disconnected")); } - #[test] - fn openrouter_provider_uses_inferred_calls_schema_for_inferred_requests() { + #[tokio::test] + async fn openrouter_provider_uses_inferred_calls_schema_for_inferred_requests() { use std::io::{Read, Write}; use std::net::TcpListener; @@ -1756,6 +1797,7 @@ mod tests { prompt: "Resolve calls".to_owned(), max_output_tokens: 512, }) + .await .expect("invoke mocked OpenRouter"); assert_eq!(response.output_json, r#"{"edges":[]}"#); @@ -1764,8 +1806,8 @@ mod tests { handle.join().expect("server thread"); } - #[test] - fn openrouter_provider_connection_error_is_retryable() { + #[tokio::test] + async fn openrouter_provider_connection_error_is_retryable() { let listener = std::net::TcpListener::bind("127.0.0.1:0").expect("bind unused port"); let addr = listener.local_addr().expect("unused port addr"); drop(listener); @@ -1782,6 +1824,7 @@ mod tests { let err = provider .invoke(sample_request()) + .await .expect_err("connection refused should be retryable"); assert!(matches!( err, @@ -1792,9 +1835,9 @@ mod tests { )); } - #[test] + #[tokio::test] #[allow(clippy::too_many_lines)] - fn codex_cli_provider_invokes_exec_with_schema_stdin_and_usage() { + async fn codex_cli_provider_invokes_exec_with_schema_stdin_and_usage() { use std::fs; use std::os::unix::fs::PermissionsExt; @@ -1911,6 +1954,7 @@ printf '%s' '{{"purpose":"via codex","behavior":"ran fake CLI","relationships":" prompt: "Summarize this function".to_owned(), max_output_tokens: 512, }) + .await .expect("invoke fake Codex CLI"); assert_eq!(provider.name(), "codex_cli"); @@ -1935,8 +1979,8 @@ printf '%s' '{{"purpose":"via codex","behavior":"ran fake CLI","relationships":" assert!(log.contains("profile=clarion")); } - #[test] - fn codex_cli_provider_fallback_usage_counts_wrapped_prompt() { + #[tokio::test] + async fn codex_cli_provider_fallback_usage_counts_wrapped_prompt() { use std::fs; use std::os::unix::fs::PermissionsExt; @@ -1992,14 +2036,17 @@ printf '%s' '{"purpose":"via codex","behavior":"ran fake CLI","relationships":"" let expected_input_tokens = estimate_text_tokens(&build_coding_agent_provider_prompt(&request)); - let response = provider.invoke(request).expect("invoke fake Codex CLI"); + let response = provider + .invoke(request) + .await + .expect("invoke fake Codex CLI"); assert_eq!(response.input_tokens, expected_input_tokens); } - #[test] + #[tokio::test] #[allow(clippy::too_many_lines)] - fn claude_cli_provider_invokes_print_mode_with_schema_and_usage() { + async fn claude_cli_provider_invokes_print_mode_with_schema_and_usage() { use std::fs; use std::os::unix::fs::PermissionsExt; @@ -2138,6 +2185,7 @@ printf '%s\n' '{{"type":"result","subtype":"success","structured_output":{{"purp prompt: "Summarize this function".to_owned(), max_output_tokens: 512, }) + .await .expect("invoke fake Claude CLI"); assert_eq!(provider.name(), "claude_cli"); @@ -2174,8 +2222,8 @@ printf '%s\n' '{{"type":"result","subtype":"success","structured_output":{{"purp assert!(log.contains("exclude_dynamic=1")); } - #[test] - fn claude_cli_provider_fallback_usage_counts_wrapped_prompt() { + #[tokio::test] + async fn claude_cli_provider_fallback_usage_counts_wrapped_prompt() { use std::fs; use std::os::unix::fs::PermissionsExt; @@ -2229,13 +2277,16 @@ printf '%s\n' '{"type":"result","subtype":"success","structured_output":{"purpos let expected_input_tokens = estimate_text_tokens(&build_coding_agent_provider_prompt(&request)); - let response = provider.invoke(request).expect("invoke fake Claude CLI"); + let response = provider + .invoke(request) + .await + .expect("invoke fake Claude CLI"); assert_eq!(response.input_tokens, expected_input_tokens); } - #[test] - fn claude_cli_provider_passes_empty_tools_arg_when_no_tools_are_configured() { + #[tokio::test] + async fn claude_cli_provider_passes_empty_tools_arg_when_no_tools_are_configured() { use std::fs; use std::os::unix::fs::PermissionsExt; @@ -2302,6 +2353,7 @@ printf '%s\n' '{{"type":"result","subtype":"success","structured_output":{{"purp prompt: "Summarize this function".to_owned(), max_output_tokens: 512, }) + .await .expect("invoke fake Claude CLI"); let log = fs::read_to_string(log_path).expect("read fake claude log"); @@ -2469,7 +2521,9 @@ printf '%s\n' '{{"type":"result","subtype":"success","structured_output":{{"purp } } - fn invoke_openrouter_once(raw_response: &'static str) -> Result { + async fn invoke_openrouter_once( + raw_response: &'static str, + ) -> Result { use std::io::{Read, Write}; use std::net::TcpListener; @@ -2493,7 +2547,7 @@ printf '%s\n' '{{"type":"result","subtype":"success","structured_output":{{"purp timeout_seconds: 30, }) .expect("test provider"); - let result = provider.invoke(sample_request()); + let result = provider.invoke(sample_request()).await; handle.join().expect("server thread"); result } diff --git a/crates/clarion-core/src/plugin/host.rs b/crates/clarion-core/src/plugin/host.rs index ba1b2280..0b747ba9 100644 --- a/crates/clarion-core/src/plugin/host.rs +++ b/crates/clarion-core/src/plugin/host.rs @@ -17,13 +17,15 @@ //! 3. **Jail check** (ADR-021 §2a): `entity.source.file_path` must canonicalise //! inside `project_root`. Escape → drop + finding; tick [`PathEscapeBreaker`]. //! Breaker tripped → kill plugin, return [`HostError::PathEscapeBreakerTripped`]. -//! 4. **Entity cap check** (ADR-021 §2c): run-cumulative count must stay ≤ 500k. +//! 4. **Item cap check** (ADR-021 §2c): run-cumulative accepted entities, +//! accepted edges, and plugin-output-derived findings must stay ≤ 500k. //! Exceeded → kill plugin, return [`HostError::EntityCapExceeded`]. //! //! # Memory limit //! -//! On Linux, [`PluginHost::spawn`] calls [`apply_prlimit_as`] inside -//! `CommandExt::pre_exec` to set `RLIMIT_AS` before `exec()`. The closure body +//! On Linux/macOS, [`PluginHost::spawn`] calls [`apply_prlimit_as`] inside +//! `CommandExt::pre_exec` to set `RLIMIT_AS` before `exec()`. On Linux the same +//! closure also applies `RLIMIT_NOFILE` and `RLIMIT_NPROC`. The closure body //! only calls `setrlimit(2)`, which is async-signal-safe per POSIX.1-2017 //! §2.4.3. The `unsafe` block is the minimum required by the `pre_exec` API. @@ -45,16 +47,15 @@ use crate::plugin::jail::{JailError, jail_to_string}; use crate::plugin::limits::{ BreakerState, CapExceeded, ContentLengthCeiling, EntityCountCap, PathEscapeBreaker, }; -// The prlimit application path is Linux-only (see the `#[cfg(target_os = -// "linux")]` pre_exec block in `spawn`); these symbols are unused on other -// targets and would trip `-D warnings`. Gate the imports to match their usage. +// The prlimit application path is Linux/macOS-only (see the matching +// `pre_exec` block in `spawn`); these symbols are unused on other targets and +// would trip `-D warnings`. Gate the imports to match their usage. #[cfg(target_os = "linux")] -use crate::plugin::limits::{ - DEFAULT_MAX_NOFILE, DEFAULT_MAX_RSS_MIB, apply_prlimit_as, apply_prlimit_nofile_nproc, - effective_rss_mib, -}; -// `DEFAULT_MAX_NPROC` is also reached from the unit tests below, so it needs -// the `test` arm in addition to Linux. +use crate::plugin::limits::{DEFAULT_MAX_NOFILE, apply_prlimit_nofile_nproc}; +#[cfg(any(target_os = "linux", target_os = "macos"))] +use crate::plugin::limits::{DEFAULT_MAX_RSS_MIB, apply_prlimit_as, effective_rss_mib}; +// `DEFAULT_MAX_NPROC` is also reached from the unit tests below, so it needs the +// `test` arm in addition to Linux. #[cfg(any(target_os = "linux", test))] use crate::plugin::limits::DEFAULT_MAX_NPROC; use crate::plugin::manifest::{Manifest, ManifestError}; @@ -96,8 +97,8 @@ pub const MAX_ENTITY_EXTRA_BYTES: usize = 64 * 1024; /// against all processes/threads for the user, not just descendants of the /// plugin, so the Sprint-1 single-plugin ceiling is too low for B.4* call /// resolution on ordinary developer workstations. -// Used only from the Linux pre_exec limit path and from unit tests; gate to -// match so non-Linux release builds don't see it as dead code under +// Used only from the Linux pre_exec limit path and from unit tests; gate +// to match so other release builds don't see it as dead code under // `-D warnings`. #[cfg(any(target_os = "linux", test))] const PYRIGHT_MAX_NPROC: u64 = 4096; @@ -143,6 +144,11 @@ pub struct RawEntity { /// load-bearing identity input a string-key typo must not silently drop. #[serde(default)] pub signature: Option, + /// Plugin-emitted categorisation tags for catalogue shortcuts and `WS5b` + /// reachability roots. Typed top-level because the core denormalises these + /// into `entity_tags`; default empty keeps the wire addition non-breaking. + #[serde(default)] + pub tags: Vec, /// Extra fields — accepted without interpretation. #[serde(flatten)] pub extra: serde_json::Map, @@ -258,6 +264,12 @@ fn oversize_field(raw: &RawEntity) -> Option<(&'static str, usize)> { return Some((name, len)); } } + if !raw.tags.is_empty() { + let len = serde_json::to_vec(&raw.tags).map_or(0, |b| b.len()); + if len > MAX_ENTITY_EXTRA_BYTES { + return Some(("tags", len)); + } + } None } @@ -329,9 +341,9 @@ pub struct AcceptedEdge { pub from_id: String, /// `to_id` as received; FK-checked at storage time. pub to_id: String, - /// Module entity id for the file this `analyze_file` call processed, if - /// the plugin emitted a module entity. Derived host-side (ADR-022 - /// boundary: plugin does not encode the file entity id formula). + /// Core file entity id for the file this `analyze_file` call processed, + /// when the caller has attached one. The plugin never encodes the file + /// entity id formula (ADR-022 boundary). pub source_file_id: Option, /// Confidence tier from the plugin wire. pub confidence: EdgeConfidence, @@ -438,6 +450,8 @@ where /// discarded on overflow so the plugin cannot back-pressure the host /// via stderr writes. stderr_tail: Option>>>, + /// Background thread draining stderr from the plugin subprocess. + stderr_thread: Option>, /// Canonical source paths whose entities must not be sent for LLM briefing. briefing_blocks: Arc>, /// Canonical source paths that were covered by the core pre-ingest scanner. @@ -583,25 +597,30 @@ impl // cannot spoof host output. .stderr(std::process::Stdio::piped()); - // SAFETY: Each `setrlimit` call inside the closure is listed as - // async-signal-safe in POSIX.1-2017 §2.4.3. The `pre_exec` closure - // runs in the forked child after `fork()` but before `exec()`, so - // only the child's limits are affected. No Rust allocation, no Drop - // and no non-async-signal-safe call occurs inside the closure; - // `u64` captures are trivially Copy. - #[cfg(target_os = "linux")] + #[cfg(any(target_os = "linux", target_os = "macos"))] { use std::os::unix::process::CommandExt; let rss_mib = effective_rss_mib( manifest.capabilities.runtime.expected_max_rss_mb, DEFAULT_MAX_RSS_MIB, ); + + #[cfg(target_os = "linux")] let max_nofile = DEFAULT_MAX_NOFILE; + #[cfg(target_os = "linux")] let max_nproc = effective_max_nproc(&manifest); + + // SAFETY: Each `setrlimit` call inside the closure is listed as + // async-signal-safe in POSIX.1-2017 §2.4.3. The `pre_exec` closure + // runs in the forked child after `fork()` but before `exec()`, so + // only the child's limits are affected. No Rust allocation, no Drop + // and no non-async-signal-safe call occurs inside the closure; + // `u64` captures are trivially Copy. #[allow(unsafe_code)] unsafe { command.pre_exec(move || { apply_prlimit_as(rss_mib)?; + #[cfg(target_os = "linux")] apply_prlimit_nofile_nproc(max_nofile, max_nproc)?; Ok(()) }); @@ -634,7 +653,7 @@ impl std::collections::VecDeque::with_capacity(STDERR_TAIL_BYTES), )); let stderr_tail_for_thread = std::sync::Arc::clone(&stderr_tail); - std::thread::Builder::new() + let stderr_thread = std::thread::Builder::new() .name(format!( "clarion-plugin-stderr-drain:{}", manifest.plugin.plugin_id @@ -649,6 +668,7 @@ impl std::io::BufWriter::new(stdin), ); host.stderr_tail = Some(stderr_tail); + host.stderr_thread = Some(stderr_thread); // Reap on handshake failure. `std::process::Child::Drop` does NOT // waitpid on Unix, so returning Err while `child` goes out of scope @@ -706,6 +726,7 @@ impl PluginHost { terminated: false, ontology_version: None, stderr_tail: None, + stderr_thread: None, briefing_blocks: Arc::new(BTreeMap::new()), scanned_source_files: Arc::new(BTreeSet::new()), } @@ -893,8 +914,9 @@ impl PluginHost { // Drop the entity, but record the serde error so operators // can distinguish "plugin returned nothing" from "plugin // returned garbage that failed to parse." - self.findings - .push(HostFinding::malformed_entity(&e.to_string())); + self.record_plugin_output_finding(HostFinding::malformed_entity( + &e.to_string(), + ))?; continue; } }; @@ -910,14 +932,16 @@ impl PluginHost { if let Some((field, len)) = oversize_field(&raw) { let finding = HostFinding::entity_field_oversize(field, len, MAX_ENTITY_FIELD_BYTES); - self.findings.push(finding); + self.record_plugin_output_finding(finding)?; continue; } // 1. Ontology check (ADR-022). if !declared_kinds.contains(&raw.kind) { - self.findings - .push(HostFinding::undeclared_kind(&raw.kind, &raw.qualified_name)); + self.record_plugin_output_finding(HostFinding::undeclared_kind( + &raw.kind, + &raw.qualified_name, + ))?; continue; } @@ -925,18 +949,18 @@ impl PluginHost { let expected_id = match entity_id(&plugin_id, &raw.kind, &raw.qualified_name) { Ok(eid) => eid, Err(e) => { - self.findings.push(HostFinding::entity_id_mismatch( + self.record_plugin_output_finding(HostFinding::entity_id_mismatch( &raw.id, &format!(""), - )); + ))?; continue; } }; if raw.id != expected_id.as_str() { - self.findings.push(HostFinding::entity_id_mismatch( + self.record_plugin_output_finding(HostFinding::entity_id_mismatch( &raw.id, expected_id.as_str(), - )); + ))?; continue; } @@ -955,10 +979,10 @@ impl PluginHost { } JailError::Io(_) => raw.source.file_path.clone(), }; - self.findings.push(HostFinding::path_escape(&offender)); + self.record_plugin_output_finding(HostFinding::path_escape(&offender))?; let state = self.path_breaker.record_escape(); if state == BreakerState::Tripped { - self.findings.push(HostFinding::disabled_path_escape()); + self.record_plugin_output_finding(HostFinding::disabled_path_escape())?; if let Err(e) = self.do_shutdown() { tracing::warn!( error = %e, @@ -971,20 +995,8 @@ impl PluginHost { } }; - // 4. Entity cap check (ADR-021 §2c). - if let Err(e) = self.entity_cap.try_admit(1) { - self.findings.push(HostFinding::entity_cap_exceeded_finding( - e.cap, - e.would_reach, - )); - if let Err(se) = self.do_shutdown() { - tracing::warn!( - error = %se, - "best-effort shutdown after entity-cap exceeded hit an error", - ); - } - return Err(HostError::EntityCapExceeded(e)); - } + // 4. Combined item cap check (ADR-021 §2c). + self.admit_plugin_output_items(1)?; self.apply_briefing_block(&mut raw, &jailed); @@ -997,8 +1009,8 @@ impl PluginHost { }); } - let accepted_edges = self.process_edges(afr.edges, &accepted); - let stats = self.process_stats(afr.stats, &accepted, path); + let accepted_edges = self.process_edges(afr.edges, &accepted)?; + let stats = self.process_stats(afr.stats, &accepted, path)?; Ok(AnalyzeFileOutcome { entities: accepted, @@ -1027,59 +1039,53 @@ impl PluginHost { } /// B.3: per-edge validation pipeline. Mirrors the entity loop's - /// drop-on-violation/emit-finding posture but without the kill paths — - /// edges do not participate in the path-escape breaker or entity cap - /// (those are entity-only). Returns the accepted edges; findings flow - /// through `self.findings`. + /// drop-on-violation/emit-finding posture. Accepted edges and findings + /// emitted for invalid plugin output both count toward ADR-021's combined + /// entity + edge + finding cap. /// - /// `source_file_id` is derived from the single `module`-kind accepted - /// entity for this file (the host, not the plugin, owns the file-id - /// formula per ADR-022). If the plugin's ontology has no module kind - /// (fixture plugin, etc.), `source_file_id` stays `None` and the writer - /// persists `NULL`. + /// `source_file_id` is intentionally left unset here. The CLI mints the + /// `core:file:*` entity for the analyzed path and attaches that id when it + /// maps the accepted edge to storage, preserving the ADR-022 boundary that + /// plugins do not encode core file identity. fn process_edges( &mut self, raw_edges: Vec, - accepted_entities: &[AcceptedEntity], - ) -> Vec { - let module_entity_id = accepted_entities - .iter() - .find(|e| e.kind == "module") - .map(|e| e.id.as_str().to_owned()); + _accepted_entities: &[AcceptedEntity], + ) -> Result, HostError> { let declared_edge_kinds = self.manifest.ontology.edge_kinds.clone(); let mut accepted_edges = Vec::with_capacity(raw_edges.len()); for raw_val in raw_edges { let raw: RawEdge = match serde_json::from_value(raw_val) { Ok(e) => e, Err(e) => { - self.findings - .push(HostFinding::malformed_edge(&e.to_string())); + self.record_plugin_output_finding(HostFinding::malformed_edge(&e.to_string()))?; continue; } }; if let Some((field, len)) = oversize_edge_field(&raw) { let finding = HostFinding::edge_field_oversize(field, len, MAX_ENTITY_FIELD_BYTES); - self.findings.push(finding); + self.record_plugin_output_finding(finding)?; continue; } if !declared_edge_kinds.contains(&raw.kind) { - self.findings.push(HostFinding::undeclared_edge_kind( + self.record_plugin_output_finding(HostFinding::undeclared_edge_kind( &raw.kind, &raw.from_id, &raw.to_id, - )); + ))?; continue; } + self.admit_plugin_output_items(1)?; accepted_edges.push(AcceptedEdge { kind: raw.kind.clone(), from_id: raw.from_id.clone(), to_id: raw.to_id.clone(), - source_file_id: module_entity_id.clone(), + source_file_id: None, confidence: raw.confidence, raw, }); } - accepted_edges + Ok(accepted_edges) } fn process_stats( @@ -1087,7 +1093,7 @@ impl PluginHost { mut stats: AnalyzeFileStats, accepted_entities: &[AcceptedEntity], analyzed_path: &Path, - ) -> AnalyzeFileStats { + ) -> Result { let accepted_ids: BTreeSet = accepted_entities .iter() .map(|entity| entity.id.as_str().to_owned()) @@ -1101,14 +1107,15 @@ impl PluginHost { if let Some(reason) = invalid_unresolved_call_site_reason(&site, &accepted_ids, file_len) { - self.findings - .push(HostFinding::malformed_unresolved_call_site(&site, &reason)); + self.record_plugin_output_finding(HostFinding::malformed_unresolved_call_site( + &site, &reason, + ))?; continue; } retained.push(site); } stats.unresolved_call_sites = retained; - stats + Ok(stats) } /// Send `shutdown` request followed by the `exit` notification. @@ -1178,6 +1185,32 @@ impl PluginHost { id } + fn admit_plugin_output_items(&mut self, delta: usize) -> Result<(), HostError> { + self.entity_cap + .try_admit(delta) + .map_err(|e| self.entity_cap_exceeded(e)) + } + + fn record_plugin_output_finding(&mut self, finding: HostFinding) -> Result<(), HostError> { + self.admit_plugin_output_items(1)?; + self.findings.push(finding); + Ok(()) + } + + fn entity_cap_exceeded(&mut self, e: CapExceeded) -> HostError { + self.findings.push(HostFinding::entity_cap_exceeded_finding( + e.cap, + e.would_reach, + )); + if let Err(se) = self.do_shutdown() { + tracing::warn!( + error = %se, + "best-effort shutdown after entity-cap exceeded hit an error", + ); + } + HostError::EntityCapExceeded(e) + } + fn do_shutdown(&mut self) -> Result<(), HostError> { // Mark terminated up front so that even if the shutdown exchange // fails mid-way (plugin hung, broken pipe), subsequent shutdown() @@ -2419,6 +2452,174 @@ ontology_version = "0.1.0" ); } + #[test] + fn t9b_edge_admission_counts_toward_combined_item_cap() { + let manifest = calls_manifest(); + let mut mock = MockPlugin::new_compliant(); + let (mut host, project_dir) = connect_and_handshake(manifest, &mut mock); + + // Three entities are admitted exactly at the cap; the following valid + // edge is the fourth plugin output item and must trip ADR-021's combined + // entity + edge + finding cap. + host.set_entity_cap_test(EntityCountCap::new(3)); + + let sample = project_dir.path().join("demo.mock"); + std::fs::write(&sample, b"").unwrap(); + let sample_path = sample.to_string_lossy().into_owned(); + let response_id = host.next_request_id_test(); + let response_json = serde_json::json!({ + "jsonrpc": "2.0", + "id": response_id, + "result": { + "entities": [ + { + "id": "mock:module:demo", + "kind": "module", + "qualified_name": "demo", + "source": { "file_path": sample_path } + }, + { + "id": "mock:function:demo.caller", + "kind": "function", + "qualified_name": "demo.caller", + "source": { "file_path": sample_path }, + "parent_id": "mock:module:demo" + }, + { + "id": "mock:function:demo.callee", + "kind": "function", + "qualified_name": "demo.callee", + "source": { "file_path": sample_path }, + "parent_id": "mock:module:demo" + } + ], + "edges": [{ + "kind": "calls", + "from_id": "mock:function:demo.caller", + "to_id": "mock:function:demo.callee", + "source_byte_start": 0, + "source_byte_end": 6, + "confidence": "resolved" + }] + } + }); + let body = serde_json::to_vec(&response_json).unwrap(); + { + let reader = host.reader_mut_test(); + let pos_before = reader.position(); + let old_end = reader.get_ref().len() as u64; + let mut framed: Vec = Vec::new(); + write_frame(&mut framed, &Frame { body }).unwrap(); + reader.get_mut().extend_from_slice(&framed); + if pos_before == old_end { + reader.set_position(old_end); + } + } + + let err = host + .analyze_file(&sample) + .expect_err("valid edge must trip combined item cap after three entities"); + assert!( + matches!(err, HostError::EntityCapExceeded(_)), + "expected EntityCapExceeded; got {err:?}" + ); + let findings = host.take_findings(); + let cap_finding = findings + .iter() + .find(|f| f.subcode == FINDING_ENTITY_CAP) + .unwrap_or_else(|| panic!("expected FINDING_ENTITY_CAP finding; got {findings:?}")); + assert_eq!( + cap_finding.metadata.get("would_reach").map(String::as_str), + Some("4"), + "would_reach metadata must count the edge; got {:?}", + cap_finding.metadata + ); + } + + #[test] + fn t9c_plugin_output_findings_count_toward_combined_item_cap() { + let manifest = calls_manifest(); + let mut mock = MockPlugin::new_compliant(); + let (mut host, project_dir) = connect_and_handshake(manifest, &mut mock); + + // Three valid entities fill the cap. The undeclared edge is dropped, but + // the host finding emitted for that plugin output is still an output item + // under ADR-021's combined entity + edge + finding cap. + host.set_entity_cap_test(EntityCountCap::new(3)); + + let sample = project_dir.path().join("demo.mock"); + std::fs::write(&sample, b"").unwrap(); + let sample_path = sample.to_string_lossy().into_owned(); + let response_id = host.next_request_id_test(); + let response_json = serde_json::json!({ + "jsonrpc": "2.0", + "id": response_id, + "result": { + "entities": [ + { + "id": "mock:module:demo", + "kind": "module", + "qualified_name": "demo", + "source": { "file_path": sample_path } + }, + { + "id": "mock:function:demo.caller", + "kind": "function", + "qualified_name": "demo.caller", + "source": { "file_path": sample_path }, + "parent_id": "mock:module:demo" + }, + { + "id": "mock:function:demo.callee", + "kind": "function", + "qualified_name": "demo.callee", + "source": { "file_path": sample_path }, + "parent_id": "mock:module:demo" + } + ], + "edges": [{ + "kind": "references", + "from_id": "mock:function:demo.caller", + "to_id": "mock:function:demo.callee", + "source_byte_start": 0, + "source_byte_end": 6, + "confidence": "resolved" + }] + } + }); + let body = serde_json::to_vec(&response_json).unwrap(); + { + let reader = host.reader_mut_test(); + let pos_before = reader.position(); + let old_end = reader.get_ref().len() as u64; + let mut framed: Vec = Vec::new(); + write_frame(&mut framed, &Frame { body }).unwrap(); + reader.get_mut().extend_from_slice(&framed); + if pos_before == old_end { + reader.set_position(old_end); + } + } + + let err = host + .analyze_file(&sample) + .expect_err("undeclared-edge finding must trip combined item cap"); + assert!( + matches!(err, HostError::EntityCapExceeded(_)), + "expected EntityCapExceeded; got {err:?}" + ); + let findings = host.take_findings(); + let cap_finding = findings + .iter() + .find(|f| f.subcode == FINDING_ENTITY_CAP) + .unwrap_or_else(|| panic!("expected FINDING_ENTITY_CAP finding; got {findings:?}")); + assert_eq!( + cap_finding.metadata.get("would_reach").map(String::as_str), + Some("4"), + "would_reach metadata must count the plugin-output finding; got {:?}", + cap_finding.metadata + ); + } + // ── Test helpers ────────────────────────────────────────────────────────── // ── analyze_file error payload ─────────────────────────────────────────── diff --git a/crates/clarion-core/src/plugin/jail.rs b/crates/clarion-core/src/plugin/jail.rs index bfb855e1..a5babeaa 100644 --- a/crates/clarion-core/src/plugin/jail.rs +++ b/crates/clarion-core/src/plugin/jail.rs @@ -83,6 +83,52 @@ pub fn jail(root: &Path, candidate: &Path) -> Result { Ok(canonical_candidate) } +/// Open a candidate file safely, mitigating TOCTOU symlink swap hazards by +/// verifying that the opened file's metadata matches the jail-checked canonical path. +pub fn safe_open(root: &Path, candidate: &Path) -> std::io::Result { + let file = std::fs::File::open(candidate)?; + let canonical_root = std::fs::canonicalize(root)?; + let canonical_candidate = std::fs::canonicalize(candidate)?; + + if !canonical_candidate.starts_with(&canonical_root) { + return Err(std::io::Error::new( + std::io::ErrorKind::PermissionDenied, + format!( + "Path escape: {} resolves outside jail root", + canonical_candidate.display() + ), + )); + } + + // TOCTOU mitigation: verify the open file handle matches the canonical path + let meta_file = file.metadata()?; + let meta_canonical = std::fs::metadata(&canonical_candidate)?; + + #[cfg(unix)] + { + use std::os::unix::fs::MetadataExt; + if meta_file.dev() != meta_canonical.dev() || meta_file.ino() != meta_canonical.ino() { + return Err(std::io::Error::new( + std::io::ErrorKind::PermissionDenied, + "TOCTOU validation failure: device or inode mismatch", + )); + } + } + + #[cfg(not(unix))] + { + // Best-effort fallback for non-Unix targets + if meta_file.len() != meta_canonical.len() { + return Err(std::io::Error::new( + std::io::ErrorKind::PermissionDenied, + "TOCTOU validation failure: file length mismatch", + )); + } + } + + Ok(file) +} + /// Assert that `candidate` is inside `root` and return the canonical path as /// a UTF-8 `String`. /// diff --git a/crates/clarion-core/src/plugin/limits.rs b/crates/clarion-core/src/plugin/limits.rs index debe30e1..ae95d712 100644 --- a/crates/clarion-core/src/plugin/limits.rs +++ b/crates/clarion-core/src/plugin/limits.rs @@ -298,7 +298,7 @@ pub const DEFAULT_MAX_NPROC: u64 = 32; /// # Errors /// /// Returns `std::io::Error` on `setrlimit` failure. -#[cfg(target_os = "linux")] +#[cfg(any(target_os = "linux", target_os = "macos"))] pub fn apply_prlimit_as(max_rss_mib: u64) -> std::io::Result<()> { use nix::sys::resource::{Resource, setrlimit}; @@ -326,32 +326,35 @@ pub fn apply_prlimit_nofile_nproc(max_nofile: u64, max_nproc: u64) -> std::io::R } /// Non-Linux stub for [`apply_prlimit_nofile_nproc`]. +/// +/// `nix` 0.28 does not expose `Resource::RLIMIT_NPROC` on macOS, so the real +/// implementation stays restricted to Linux. #[cfg(not(target_os = "linux"))] pub fn apply_prlimit_nofile_nproc(_max_nofile: u64, _max_nproc: u64) -> std::io::Result<()> { Ok(()) } -/// No-op stub for non-Linux targets (UQ-WP2-06: Linux-only for Sprint 1). +/// No-op stub for non-Linux/macOS targets (UQ-WP2-06: Linux-only for Sprint 1). /// /// Logs a one-time warning and returns `Ok(())`. The caller proceeds without a /// memory ceiling on the plugin process. -#[cfg(not(target_os = "linux"))] +#[cfg(not(any(target_os = "linux", target_os = "macos")))] pub fn apply_prlimit_as(_max_rss_mib: u64) -> std::io::Result<()> { warn_once_non_linux(); Ok(()) } -/// Emit a one-time warning on non-Linux platforms. +/// Emit a one-time warning on non-Linux/macOS platforms. /// /// Uses `std::sync::Once` rather than `tracing` — clarion-core has no tracing /// dep and we do not add one for this single warning (per task spec). -#[cfg(not(target_os = "linux"))] +#[cfg(not(any(target_os = "linux", target_os = "macos")))] fn warn_once_non_linux() { use std::sync::Once; static WARN: Once = Once::new(); WARN.call_once(|| { eprintln!( - "clarion: RLIMIT_AS enforcement is Linux-only; \ + "clarion: RLIMIT_AS enforcement is Linux/macOS only; \ plugin memory ceiling will not be applied on this platform" ); }); @@ -562,11 +565,31 @@ mod tests { assert!(result.is_ok(), "apply_prlimit_as must succeed: {result:?}"); } + #[test] + fn nofile_nproc_limit_is_not_enabled_for_macos() { + let source = include_str!("limits.rs"); + assert!( + source.contains( + "#[cfg(target_os = \"linux\")]\n\ +pub fn apply_prlimit_nofile_nproc" + ), + "RLIMIT_NOFILE/RLIMIT_NPROC helper must stay Linux-only; \ + nix 0.28 does not expose Resource::RLIMIT_NPROC on macOS" + ); + assert!( + !source.contains( + "#[cfg(any(target_os = \"linux\", target_os = \"macos\"))]\n\ +pub fn apply_prlimit_nofile_nproc" + ), + "macOS must not compile the RLIMIT_NPROC branch" + ); + } + /// On non-Linux: the stub path compiles and returns Ok (type-level check). #[cfg(not(target_os = "linux"))] #[test] fn apply_prlimit_non_linux_stub_returns_ok() { - let result = apply_prlimit_as(DEFAULT_MAX_RSS_MIB); + let result = apply_prlimit_nofile_nproc(DEFAULT_MAX_NOFILE, DEFAULT_MAX_NPROC); assert!(result.is_ok()); } } diff --git a/crates/clarion-core/tests/host_subprocess.rs b/crates/clarion-core/tests/host_subprocess.rs index e8fd03f4..18b4a77c 100644 --- a/crates/clarion-core/tests/host_subprocess.rs +++ b/crates/clarion-core/tests/host_subprocess.rs @@ -190,12 +190,8 @@ fn t1_subprocess_happy_path() { /// stdout and returns a transport error. /// /// **What this test asserts**: `spawn()` returns `Err` and the whole call -/// completes well under 5 s. That's strictly a "did we hang?" probe — it -/// does NOT directly verify the zombie-reap behaviour added in commit -/// 0fcc57f (that fix is covered by code review of `host.rs::spawn`'s -/// `if let Err(e) = host.handshake()` block). Direct zombie observation -/// requires walking `/proc`, which is Linux-only and brittle across kernel -/// versions. +/// completes well under 5 s. Zombie-reap coverage lives in the Linux-only +/// `/proc` test below. /// /// The earlier name `t9_handshake_failure_exits_cleanly_without_hanging` /// overstated this — "exits cleanly" implied zombie-reap coverage. @@ -233,6 +229,85 @@ fn t9_handshake_failure_on_immediate_exit_returns_err_promptly() { ); } +/// T9a: handshake failure reaps the subprocess. +/// +/// The stub records its PID then exits without speaking JSON-RPC. If +/// `PluginHost::spawn` drops the `Child` without `wait()`, Linux keeps that PID +/// as a zombie owned by this test process. The assertion below must fail in +/// that regression. +#[test] +#[cfg(target_os = "linux")] +fn t9a_handshake_failure_reaps_exited_subprocess() { + let manifest = parse_manifest(FIXTURE_MANIFEST_BYTES).expect("fixture manifest must parse"); + + let project_dir = tempfile::TempDir::new().expect("tmpdir"); + let stub_dir = tempfile::TempDir::new().expect("stub dir"); + let pid_file = stub_dir.path().join("plugin.pid"); + let stub_exec = stub_dir.path().join("clarion-plugin-fixture"); + std::fs::write( + &stub_exec, + format!( + "#!/bin/sh\nprintf '%s\\n' \"$$\" > {}\nexit 0\n", + shell_quote(&pid_file) + ), + ) + .expect("write handshake-failing stub"); + let mut perms = std::fs::metadata(&stub_exec) + .expect("stub metadata") + .permissions(); + std::os::unix::fs::PermissionsExt::set_mode(&mut perms, 0o755); + std::fs::set_permissions(&stub_exec, perms).expect("chmod stub"); + + let result = PluginHost::spawn(manifest, project_dir.path(), &stub_exec); + + assert!( + result.is_err(), + "spawn must fail when executable exits before handshake response" + ); + let pid = read_recorded_pid(&pid_file); + assert_not_linux_zombie(pid); +} + +#[cfg(target_os = "linux")] +fn read_recorded_pid(path: &std::path::Path) -> u32 { + let deadline = std::time::Instant::now() + std::time::Duration::from_secs(2); + loop { + if let Ok(contents) = std::fs::read_to_string(path) { + return contents + .trim() + .parse::() + .expect("stub must record numeric pid"); + } + assert!( + std::time::Instant::now() < deadline, + "stub did not record its pid at {}", + path.display() + ); + std::thread::sleep(std::time::Duration::from_millis(10)); + } +} + +#[cfg(target_os = "linux")] +fn assert_not_linux_zombie(pid: u32) { + let status_path = std::path::PathBuf::from(format!("/proc/{pid}/status")); + let Ok(status) = std::fs::read_to_string(&status_path) else { + return; + }; + let state = status + .lines() + .find(|line| line.starts_with("State:")) + .unwrap_or("State: "); + assert!( + !state.contains("zombie") && !state.contains("\tZ"), + "handshake-failed subprocess pid {pid} was not reaped: {state}" + ); +} + +#[cfg(target_os = "linux")] +fn shell_quote(path: &std::path::Path) -> String { + format!("'{}'", path.display().to_string().replace('\'', "'\\''")) +} + /// T9b: `stderr_tail()` is wired on subprocess-backed hosts. The fixture /// plugin does not write to stderr on the happy path, so the tail is /// `Some("")` or `Some()`; the key assertion is that it's `Some` diff --git a/crates/clarion-federation/Cargo.toml b/crates/clarion-federation/Cargo.toml new file mode 100644 index 00000000..7dbf4371 --- /dev/null +++ b/crates/clarion-federation/Cargo.toml @@ -0,0 +1,22 @@ +[package] +name = "clarion-federation" +version.workspace = true +edition.workspace = true +license.workspace = true +repository.workspace = true +rust-version.workspace = true + +[lints] +workspace = true + +[dependencies] +clarion-core = { path = "../clarion-core", version = "1.2.0" } +clarion-storage = { path = "../clarion-storage", version = "1.2.0" } +reqwest.workspace = true +serde.workspace = true +serde_json.workspace = true +serde_norway.workspace = true +thiserror.workspace = true + +[dev-dependencies] +tempfile.workspace = true diff --git a/crates/clarion-federation/src/config.rs b/crates/clarion-federation/src/config.rs new file mode 100644 index 00000000..5377d159 --- /dev/null +++ b/crates/clarion-federation/src/config.rs @@ -0,0 +1,1075 @@ +use std::path::Path; +use std::{fs, net::SocketAddr}; + +use serde::Deserialize; +use thiserror::Error; + +#[derive(Debug, Clone, PartialEq, Deserialize, Default)] +#[serde(default)] +pub struct McpConfig { + #[serde(alias = "llm_policy")] + pub llm: LlmConfig, + pub semantic_search: SemanticSearchConfig, + pub integrations: IntegrationsConfig, + pub serve: ServeConfig, +} + +impl McpConfig { + pub fn from_path(path: &Path) -> Result { + let raw = fs::read_to_string(path).map_err(|source| ConfigError::Io { + path: path.display().to_string(), + source, + })?; + Self::from_yaml_str(&raw) + } + + pub fn from_yaml_str(raw: &str) -> Result { + if raw.trim().is_empty() { + return Ok(Self::default()); + } + reject_llm_policy_alias_collision(raw)?; + let config: Self = + serde_norway::from_str(raw).map_err(|err| ConfigError::Yaml(err.to_string()))?; + config.validate()?; + Ok(config) + } + + fn validate(&self) -> Result<(), ConfigError> { + if self.llm.provider == LlmProviderKind::Anthropic + || self.llm.anthropic_api_key_env.is_some() + { + return Err(ConfigError::DeprecatedProvider { + code: "CLA-CONFIG-DEPRECATED-PROVIDER", + }); + } + if self.integrations.filigree.enabled && self.integrations.filigree.actor.trim().is_empty() + { + return Err(ConfigError::InvalidFiligreeActor { + code: "CLA-CONFIG-FILIGREE-ACTOR-BLANK", + }); + } + self.serve.http.validate_loopback_trust()?; + Ok(()) + } +} + +#[derive(Debug, Clone, PartialEq, Deserialize)] +#[serde(default)] +pub struct LlmConfig { + pub enabled: bool, + pub provider: LlmProviderKind, + pub allow_live_provider: bool, + pub session_token_ceiling: u64, + pub model_id: String, + pub openrouter: OpenRouterConfig, + pub codex_cli: CodexCliConfig, + pub claude_cli: ClaudeCliConfig, + pub recording_fixture_path: Option, + pub max_inferred_edges_per_caller: u32, + pub cache_max_age_days: u32, + pub anthropic_api_key_env: Option, +} + +impl Default for LlmConfig { + fn default() -> Self { + Self { + enabled: false, + provider: LlmProviderKind::OpenRouter, + allow_live_provider: false, + session_token_ceiling: 1_000_000, + model_id: "anthropic/claude-sonnet-4.6".to_owned(), + openrouter: OpenRouterConfig::default(), + codex_cli: CodexCliConfig::default(), + claude_cli: ClaudeCliConfig::default(), + recording_fixture_path: None, + max_inferred_edges_per_caller: 8, + cache_max_age_days: 180, + anthropic_api_key_env: None, + } + } +} + +/// Semantic-search (embeddings) policy for `search_semantic` (`WS5b` / ADR-040). +/// **Opt-in, off by default** — mirrors [`LlmConfig`]; Loom is local-first, so +/// nothing here makes a hosted embedding service required. When `enabled` is +/// false the `search_semantic` tool degrades honestly to "not enabled". +#[derive(Debug, Clone, PartialEq, Deserialize)] +#[serde(default)] +pub struct SemanticSearchConfig { + pub enabled: bool, + /// Explicit opt-in to the live API provider (in addition to `enabled`). + pub allow_live_provider: bool, + /// Embedding model id; embeddings are cache-keyed by this. + pub model_id: String, + /// Vector dimensionality (must match the model). + pub dimensions: usize, + /// `OpenAI`-compatible base URL (`/embeddings` is appended). + pub endpoint_url: String, + /// Env var holding the API key for the live provider. + pub api_key_env: String, + pub timeout_seconds: u64, + /// Per-session embedding token ceiling for cost governance. + pub session_token_ceiling: u64, +} + +impl Default for SemanticSearchConfig { + fn default() -> Self { + Self { + enabled: false, + allow_live_provider: false, + model_id: "text-embedding-3-small".to_owned(), + dimensions: 1536, + endpoint_url: "https://api.openai.com/v1".to_owned(), + api_key_env: "OPENAI_API_KEY".to_owned(), + timeout_seconds: 60, + session_token_ceiling: 5_000_000, + } + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum LlmProviderKind { + #[serde(rename = "openrouter", alias = "open_router")] + OpenRouter, + #[serde(rename = "codex_cli", alias = "codex")] + CodexCli, + #[serde(rename = "claude_cli", alias = "claude_code")] + ClaudeCli, + Anthropic, + Recording, +} + +#[derive(Debug, Clone, PartialEq, Deserialize)] +#[serde(default)] +pub struct OpenRouterConfig { + pub endpoint_url: String, + pub api_key_env: String, + pub attribution: OpenRouterAttributionConfig, + pub timeout_seconds: u64, +} + +impl Default for OpenRouterConfig { + fn default() -> Self { + Self { + endpoint_url: "https://openrouter.ai/api/v1".to_owned(), + api_key_env: "OPENROUTER_API_KEY".to_owned(), + attribution: OpenRouterAttributionConfig::default(), + timeout_seconds: 300, + } + } +} + +#[derive(Debug, Clone, PartialEq, Deserialize)] +#[serde(default)] +pub struct OpenRouterAttributionConfig { + pub referer: String, + pub title: String, +} + +impl Default for OpenRouterAttributionConfig { + fn default() -> Self { + Self { + referer: "https://github.com/tachyon-beep/clarion".to_owned(), + title: "Clarion".to_owned(), + } + } +} + +#[derive(Debug, Clone, PartialEq, Deserialize)] +#[serde(default)] +pub struct CodexCliConfig { + pub executable: String, + pub model: Option, + pub profile: Option, + pub sandbox: CodexSandboxMode, + pub timeout_seconds: u64, +} + +impl Default for CodexCliConfig { + fn default() -> Self { + Self { + executable: "codex".to_owned(), + model: None, + profile: None, + sandbox: CodexSandboxMode::ReadOnly, + timeout_seconds: 300, + } + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Deserialize)] +#[serde(rename_all = "kebab-case")] +pub enum CodexSandboxMode { + ReadOnly, + WorkspaceWrite, + DangerFullAccess, +} + +impl CodexSandboxMode { + #[must_use] + pub fn as_str(self) -> &'static str { + match self { + Self::ReadOnly => "read-only", + Self::WorkspaceWrite => "workspace-write", + Self::DangerFullAccess => "danger-full-access", + } + } +} + +#[derive(Debug, Clone, PartialEq, Deserialize)] +#[serde(default)] +pub struct ClaudeCliConfig { + pub executable: String, + pub model: Option, + pub permission_mode: ClaudePermissionMode, + pub tools: Vec, + pub timeout_seconds: u64, + pub max_turns: u32, + pub no_session_persistence: bool, + pub exclude_dynamic_system_prompt_sections: bool, +} + +impl Default for ClaudeCliConfig { + fn default() -> Self { + Self { + executable: "claude".to_owned(), + model: None, + permission_mode: ClaudePermissionMode::Plan, + tools: Vec::new(), + timeout_seconds: 300, + max_turns: 2, + no_session_persistence: true, + exclude_dynamic_system_prompt_sections: true, + } + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Deserialize)] +pub enum ClaudePermissionMode { + #[serde(rename = "plan")] + Plan, + #[serde(rename = "default")] + Default, + #[serde(rename = "acceptEdits")] + AcceptEdits, + #[serde(rename = "bypassPermissions")] + BypassPermissions, +} + +impl ClaudePermissionMode { + #[must_use] + pub fn as_str(self) -> &'static str { + match self { + Self::Plan => "plan", + Self::Default => "default", + Self::AcceptEdits => "acceptEdits", + Self::BypassPermissions => "bypassPermissions", + } + } +} + +#[derive(Debug, Clone, PartialEq, Default, Deserialize)] +#[serde(default)] +pub struct IntegrationsConfig { + pub filigree: FiligreeConfig, +} + +#[derive(Debug, Clone, PartialEq, Default, Deserialize)] +#[serde(default)] +pub struct ServeConfig { + pub http: HttpReadConfig, +} + +#[derive(Debug, Clone, PartialEq, Deserialize)] +#[serde(default)] +pub struct HttpReadConfig { + pub enabled: bool, + #[serde(deserialize_with = "deserialize_socket_addr")] + pub bind: SocketAddr, + pub allow_non_loopback: bool, + /// Name of the env var holding the inbound bearer token. When the env + /// var is set, every `/api/v1/files`-family request must carry + /// `Authorization: Bearer `; the capabilities probe is + /// always unauthenticated. When the env var is unset on a loopback + /// bind, the surface stays unauthenticated (the v0.1 trust model). + /// When the env var is unset on a non-loopback bind, `clarion serve` + /// refuses to start (`CLA-CONFIG-HTTP-NO-AUTH`). Default + /// `CLARION_LOOM_TOKEN` matches Filigree's pinned client default. + pub token_env: String, + /// Optional env var holding the Loom component identity HMAC secret. + /// When configured, `clarion serve` refuses to start unless the env var + /// exists and protected HTTP read routes require + /// `X-Loom-Component: clarion:`. + pub identity_token_env: Option, + /// Enable the Wardline taint-store WRITE API (POST /api/wardline/taint-facts). + /// Default false — `serve` is read-only unless explicitly opted in (ADR-036). + /// When true, `serve` spawns an optional ADR-011 writer-actor. + #[serde(default)] + pub wardline_taint_write: bool, +} + +impl Default for HttpReadConfig { + fn default() -> Self { + Self { + enabled: false, + bind: SocketAddr::from(([127, 0, 0, 1], 9111)), + allow_non_loopback: false, + token_env: "CLARION_LOOM_TOKEN".to_owned(), + identity_token_env: None, + wardline_taint_write: false, + } + } +} + +impl HttpReadConfig { + pub fn validate_loopback_trust(&self) -> Result<(), ConfigError> { + if self.enabled && !self.allow_non_loopback && !self.is_loopback_bind() { + return Err(ConfigError::NonLoopbackHttpBind { + code: "CLA-CONFIG-HTTP-NON-LOOPBACK", + bind: self.bind, + }); + } + Ok(()) + } + + /// Refuse to start a non-loopback HTTP read API when the inbound bearer + /// token env var is unset. Loopback binds with the env var unset stay + /// unauthenticated (v0.1 trust matrix); the failure case is the explicit + /// `allow_non_loopback: true` opt-in plus an unset `token_env`. + pub fn validate_auth_trust(&self, env_lookup: F) -> Result<(), ConfigError> + where + F: Fn(&str) -> Option, + { + if !self.enabled { + return Ok(()); + } + let has_identity_secret = match self.identity_token_env.as_deref() { + Some(env_var) => { + let has_secret = env_lookup(env_var) + .as_deref() + .is_some_and(|value| !value.trim().is_empty()); + if !has_secret { + return Err(ConfigError::MissingHttpIdentitySecret { + code: "CLA-CONFIG-HTTP-IDENTITY-MISSING", + token_env: env_var.to_owned(), + }); + } + true + } + None => false, + }; + if self.is_loopback_bind() { + return Ok(()); + } + if has_identity_secret { + return Ok(()); + } + let has_token = env_lookup(&self.token_env) + .as_deref() + .is_some_and(|value| !value.trim().is_empty()); + if has_token { + return Ok(()); + } + Err(ConfigError::NonLoopbackHttpNoAuth { + code: "CLA-CONFIG-HTTP-NO-AUTH", + bind: self.bind, + token_env: self.token_env.clone(), + }) + } + + #[must_use] + pub fn is_loopback_bind(&self) -> bool { + self.bind.ip().is_loopback() + } +} + +fn deserialize_socket_addr<'de, D>(deserializer: D) -> Result +where + D: serde::Deserializer<'de>, +{ + let raw = String::deserialize(deserializer)?; + raw.parse() + .map_err(|err| serde::de::Error::custom(format!("invalid serve.http.bind {raw:?}: {err}"))) +} + +#[derive(Debug, Clone, PartialEq, Deserialize)] +#[serde(default)] +pub struct FiligreeConfig { + pub enabled: bool, + pub base_url: String, + pub actor: String, + pub token_env: String, + pub timeout_seconds: u64, + /// Whether `clarion analyze` POSTs its findings to Filigree's + /// `POST /api/v1/scan-results` intake on completion (WP9-B, + /// REQ-FINDING-03). Emission is a one-way Clarion→Filigree data egress, so + /// it is its own explicit opt-in: it requires both `enabled` *and* this + /// flag, and **both default `false`**. Enabling the integration for the + /// read side (`issues_for` reverse-lookup) therefore does not silently + /// start outbound emission — the operator opts into the write direction + /// separately by setting `emit_findings: true`. + pub emit_findings: bool, + /// Age threshold (days) for `clarion analyze --prune-unseen` (REQ-FINDING-06): + /// findings Filigree has marked `unseen_in_latest` and that are older than + /// this are soft-archived (`fixed`) by the retention sweep. Default 30. + /// Only consulted when `--prune-unseen` is passed; the sweep itself is + /// opt-in per invocation, not on by default. + pub prune_unseen_days: u32, +} + +impl Default for FiligreeConfig { + fn default() -> Self { + Self { + enabled: false, + base_url: "http://127.0.0.1:8766".to_owned(), + actor: "clarion-mcp".to_owned(), + token_env: "FILIGREE_API_TOKEN".to_owned(), + timeout_seconds: 5, + emit_findings: false, + prune_unseen_days: 30, + } + } +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub enum ProviderSelection { + Disabled, + Recording, + OpenRouter { api_key_env: String }, + CodexCli, + ClaudeCli, +} + +pub fn select_provider_with_env( + config: &McpConfig, + env_lookup: F, +) -> Result +where + F: Fn(&str) -> Option, +{ + if !config.llm.enabled { + return Ok(ProviderSelection::Disabled); + } + + match config.llm.provider { + LlmProviderKind::Recording => Ok(ProviderSelection::Recording), + LlmProviderKind::Anthropic => Err(ConfigError::DeprecatedProvider { + code: "CLA-CONFIG-DEPRECATED-PROVIDER", + }), + LlmProviderKind::OpenRouter => { + let live_env_opt_in = env_lookup("CLARION_LLM_LIVE").as_deref() == Some("1"); + if !config.llm.allow_live_provider && !live_env_opt_in { + return Ok(ProviderSelection::Disabled); + } + + let env_var = config.llm.openrouter.api_key_env.clone(); + let has_key = env_lookup(&env_var) + .as_deref() + .is_some_and(|value| !value.trim().is_empty()); + if !has_key { + return Err(ConfigError::MissingOpenRouterApiKey { env_var }); + } + + Ok(ProviderSelection::OpenRouter { + api_key_env: env_var, + }) + } + LlmProviderKind::CodexCli => { + let live_env_opt_in = env_lookup("CLARION_LLM_LIVE").as_deref() == Some("1"); + if !config.llm.allow_live_provider && !live_env_opt_in { + return Ok(ProviderSelection::Disabled); + } + Ok(ProviderSelection::CodexCli) + } + LlmProviderKind::ClaudeCli => { + let live_env_opt_in = env_lookup("CLARION_LLM_LIVE").as_deref() == Some("1"); + if !config.llm.allow_live_provider && !live_env_opt_in { + return Ok(ProviderSelection::Disabled); + } + Ok(ProviderSelection::ClaudeCli) + } + } +} + +#[derive(Debug, Error)] +pub enum ConfigError { + #[error("read MCP config {path}: {source}")] + Io { + path: String, + #[source] + source: std::io::Error, + }, + + #[error("invalid MCP config: {0}")] + Yaml(String), + + #[error("live OpenRouter provider selected but API key env var {env_var} is missing")] + MissingOpenRouterApiKey { env_var: String }, + + #[error( + "{code}: llm.provider=anthropic is deprecated; use llm_policy.provider: openrouter with llm_policy.openrouter.api_key_env and llm_policy.model_id" + )] + DeprecatedProvider { code: &'static str }, + + #[error("{code}: integrations.filigree.actor must not be blank when Filigree is enabled")] + InvalidFiligreeActor { code: &'static str }, + + #[error( + "{code}: serve.http.bind {bind} exposes the unauthenticated non-loopback Clarion HTTP read API; \ + bind to loopback (127.0.0.1 or ::1) or set serve.http.allow_non_loopback: true only on a trusted network" + )] + NonLoopbackHttpBind { + code: &'static str, + bind: SocketAddr, + }, + + #[error( + "{code}: serve.http.bind {bind} is non-loopback and serve.http.allow_non_loopback is true, \ + but the inbound auth env var ${token_env} is unset; refusing to start an unauthenticated \ + HTTP read API on a routable interface. Set ${token_env} to a non-empty bearer token, \ + or bind to loopback." + )] + NonLoopbackHttpNoAuth { + code: &'static str, + bind: SocketAddr, + token_env: String, + }, + + #[error( + "{code}: serve.http.identity_token_env names ${token_env}, but that env var is unset; \ + refusing to start an HTTP read API with incomplete Loom component identity configuration." + )] + MissingHttpIdentitySecret { + code: &'static str, + token_env: String, + }, + + #[error( + "{code}: clarion.yaml contains both `llm` and `llm_policy` top-level keys; \ + `llm_policy` is a serde alias for `llm` and serde silently discards one. \ + Pick one and remove the other." + )] + AmbiguousLlmKey { code: &'static str }, +} + +/// Reject configs that name both `llm` and `llm_policy` at the top level. +/// They alias the same field; serde-norway silently picks one and discards +/// the other, which is the classic copy-paste-migration pitfall. Detecting +/// the collision pre-parse turns a silent override into a typed error. +fn reject_llm_policy_alias_collision(raw: &str) -> Result<(), ConfigError> { + let value: serde_norway::Value = match serde_norway::from_str(raw) { + Ok(value) => value, + // If the YAML doesn't even parse as a generic Value, let the typed + // parse below produce the canonical Yaml error. + Err(_) => return Ok(()), + }; + let Some(mapping) = value.as_mapping() else { + return Ok(()); + }; + let has_llm = mapping.contains_key("llm"); + let has_llm_policy = mapping.contains_key("llm_policy"); + if has_llm && has_llm_policy { + return Err(ConfigError::AmbiguousLlmKey { + code: "CLA-CONFIG-AMBIGUOUS-LLM-KEY", + }); + } + Ok(()) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn parses_mcp_llm_and_filigree_config() { + let cfg = McpConfig::from_yaml_str( + r#" +llm: + enabled: true + provider: openrouter + session_token_ceiling: 250000 + model_id: anthropic/claude-sonnet-4.6 + openrouter: + endpoint_url: http://localhost:4000/api/v1 + api_key_env: TEST_OPENROUTER_KEY + attribution: + referer: https://example.invalid/clarion + title: Clarion Test + max_inferred_edges_per_caller: 3 + cache_max_age_days: 7 +integrations: + filigree: + enabled: true + base_url: "http://127.0.0.1:9999" + actor: "clarion-test" + token_env: TEST_FILIGREE_TOKEN + timeout_seconds: 2 +"#, + ) + .expect("parse config"); + + assert!(cfg.llm.enabled); + assert_eq!(cfg.llm.provider, LlmProviderKind::OpenRouter); + assert_eq!(cfg.llm.session_token_ceiling, 250_000); + assert_eq!(cfg.llm.model_id, "anthropic/claude-sonnet-4.6"); + assert_eq!( + cfg.llm.openrouter.endpoint_url, + "http://localhost:4000/api/v1" + ); + assert_eq!(cfg.llm.openrouter.api_key_env, "TEST_OPENROUTER_KEY"); + assert_eq!( + cfg.llm.openrouter.attribution.referer, + "https://example.invalid/clarion" + ); + assert_eq!(cfg.llm.openrouter.attribution.title, "Clarion Test"); + assert_eq!(cfg.llm.openrouter.timeout_seconds, 300); // default — not set in YAML + assert_eq!(cfg.llm.max_inferred_edges_per_caller, 3); + assert_eq!(cfg.llm.cache_max_age_days, 7); + assert!(cfg.integrations.filigree.enabled); + assert_eq!(cfg.integrations.filigree.base_url, "http://127.0.0.1:9999"); + assert_eq!(cfg.integrations.filigree.actor, "clarion-test"); + assert_eq!(cfg.integrations.filigree.token_env, "TEST_FILIGREE_TOKEN"); + assert_eq!(cfg.integrations.filigree.timeout_seconds, 2); + } + + #[test] + fn filigree_emission_is_opt_in_independent_of_enabled() { + // clarion-a26de2f368: outbound finding emission is a one-way egress and + // must not piggyback on enabling Filigree for read enrichment. Both + // knobs default false so flipping `enabled` for `issues_for` never + // silently starts POSTing findings. + let defaults = FiligreeConfig::default(); + assert!(!defaults.enabled); + assert!( + !defaults.emit_findings, + "emit_findings must default false (explicit write opt-in)" + ); + + // Turning on the read side alone leaves emission off. + let read_only = McpConfig::from_yaml_str( + r" +integrations: + filigree: + enabled: true +", + ) + .expect("parse config"); + assert!(read_only.integrations.filigree.enabled); + assert!( + !read_only.integrations.filigree.emit_findings, + "enabling Filigree for reads must not turn on outbound emission" + ); + } + + #[test] + fn accepts_llm_policy_alias_for_operator_config() { + let cfg = McpConfig::from_yaml_str( + r" +llm_policy: + enabled: true + provider: openrouter + model_id: openai/gpt-4o-mini +", + ) + .expect("parse config"); + + assert!(cfg.llm.enabled); + assert_eq!(cfg.llm.provider, LlmProviderKind::OpenRouter); + assert_eq!(cfg.llm.model_id, "openai/gpt-4o-mini"); + } + + #[test] + fn rejects_both_llm_and_llm_policy_keys_present_together() { + // Realistic migration-doc copy-paste case: operator copies the new + // `llm_policy:` block but forgets to delete the old `llm:` block. + // Serde-norway would silently pick one and discard the other. + let err = McpConfig::from_yaml_str( + r" +llm: + enabled: false + provider: recording +llm_policy: + enabled: true + provider: openrouter + model_id: openai/gpt-4o-mini +", + ) + .expect_err("ambiguous llm key must be rejected"); + + match err { + ConfigError::AmbiguousLlmKey { code } => { + assert_eq!(code, "CLA-CONFIG-AMBIGUOUS-LLM-KEY"); + } + other => panic!("expected AmbiguousLlmKey error, got: {other:?}"), + } + } + + #[test] + fn api_key_alone_does_not_select_live_provider() { + let cfg = McpConfig { + llm: LlmConfig { + enabled: true, + provider: LlmProviderKind::OpenRouter, + ..LlmConfig::default() + }, + semantic_search: SemanticSearchConfig::default(), + integrations: IntegrationsConfig::default(), + serve: ServeConfig::default(), + }; + + let selected = select_provider_with_env(&cfg, |name| { + (name == "OPENROUTER_API_KEY").then(|| "secret".to_owned()) + }) + .expect("provider selection"); + + assert_eq!(selected, ProviderSelection::Disabled); + } + + #[test] + fn live_provider_requires_config_or_env_opt_in_and_api_key() { + let cfg = McpConfig { + llm: LlmConfig { + enabled: true, + provider: LlmProviderKind::OpenRouter, + allow_live_provider: true, + ..LlmConfig::default() + }, + semantic_search: SemanticSearchConfig::default(), + integrations: IntegrationsConfig::default(), + serve: ServeConfig::default(), + }; + + let missing = select_provider_with_env(&cfg, |_| None).expect_err("missing key"); + assert!(matches!( + missing, + ConfigError::MissingOpenRouterApiKey { ref env_var } + if env_var == "OPENROUTER_API_KEY" + )); + + let selected = select_provider_with_env(&cfg, |name| { + (name == "OPENROUTER_API_KEY").then(|| "secret".to_owned()) + }) + .expect("provider selection"); + assert_eq!( + selected, + ProviderSelection::OpenRouter { + api_key_env: "OPENROUTER_API_KEY".to_owned() + } + ); + } + + #[test] + fn codex_cli_provider_requires_live_opt_in_but_no_api_key() { + let cfg = McpConfig::from_yaml_str( + r" +llm_policy: + enabled: true + provider: codex_cli + allow_live_provider: true + model_id: codex-cli-default + codex_cli: + executable: /tmp/fake-codex + model: gpt-5.5 + profile: clarion + sandbox: read-only + timeout_seconds: 30 +", + ) + .expect("parse Codex CLI provider config"); + + assert_eq!(cfg.llm.provider, LlmProviderKind::CodexCli); + assert_eq!(cfg.llm.model_id, "codex-cli-default"); + assert_eq!(cfg.llm.codex_cli.executable, "/tmp/fake-codex"); + assert_eq!(cfg.llm.codex_cli.model.as_deref(), Some("gpt-5.5")); + assert_eq!(cfg.llm.codex_cli.profile.as_deref(), Some("clarion")); + assert_eq!(cfg.llm.codex_cli.sandbox, CodexSandboxMode::ReadOnly); + assert_eq!(cfg.llm.codex_cli.timeout_seconds, 30); + + let selected = select_provider_with_env(&cfg, |_| None).expect("provider selection"); + assert_eq!(selected, ProviderSelection::CodexCli); + } + + #[test] + fn codex_cli_provider_stays_disabled_without_live_opt_in() { + let cfg = McpConfig { + llm: LlmConfig { + enabled: true, + provider: LlmProviderKind::CodexCli, + ..LlmConfig::default() + }, + semantic_search: SemanticSearchConfig::default(), + integrations: IntegrationsConfig::default(), + serve: ServeConfig::default(), + }; + + let selected = select_provider_with_env(&cfg, |_| None).expect("provider selection"); + assert_eq!(selected, ProviderSelection::Disabled); + + let env_selected = select_provider_with_env(&cfg, |name| { + (name == "CLARION_LLM_LIVE").then(|| "1".to_owned()) + }) + .expect("provider selection via env opt-in"); + assert_eq!(env_selected, ProviderSelection::CodexCli); + } + + #[test] + fn claude_cli_provider_requires_live_opt_in_but_no_api_key() { + let cfg = McpConfig::from_yaml_str( + r#" +llm_policy: + enabled: true + provider: claude_cli + allow_live_provider: true + model_id: claude-code-default + claude_cli: + executable: /tmp/fake-claude + model: claude-sonnet-4-6 + permission_mode: plan + tools: ["Read", "Glob", "Grep"] + timeout_seconds: 45 + max_turns: 2 + no_session_persistence: true +"#, + ) + .expect("parse Claude CLI provider config"); + + assert_eq!(cfg.llm.provider, LlmProviderKind::ClaudeCli); + assert_eq!(cfg.llm.model_id, "claude-code-default"); + assert_eq!(cfg.llm.claude_cli.executable, "/tmp/fake-claude"); + assert_eq!( + cfg.llm.claude_cli.model.as_deref(), + Some("claude-sonnet-4-6") + ); + assert_eq!( + cfg.llm.claude_cli.permission_mode, + ClaudePermissionMode::Plan + ); + assert_eq!(cfg.llm.claude_cli.tools, vec!["Read", "Glob", "Grep"]); + assert_eq!(cfg.llm.claude_cli.timeout_seconds, 45); + assert_eq!(cfg.llm.claude_cli.max_turns, 2); + assert!(cfg.llm.claude_cli.no_session_persistence); + + let selected = select_provider_with_env(&cfg, |_| None).expect("provider selection"); + assert_eq!(selected, ProviderSelection::ClaudeCli); + } + + #[test] + fn claude_cli_provider_stays_disabled_without_live_opt_in() { + let cfg = McpConfig { + llm: LlmConfig { + enabled: true, + provider: LlmProviderKind::ClaudeCli, + ..LlmConfig::default() + }, + semantic_search: SemanticSearchConfig::default(), + integrations: IntegrationsConfig::default(), + serve: ServeConfig::default(), + }; + + let selected = select_provider_with_env(&cfg, |_| None).expect("provider selection"); + assert_eq!(selected, ProviderSelection::Disabled); + + let env_selected = select_provider_with_env(&cfg, |name| { + (name == "CLARION_LLM_LIVE").then(|| "1".to_owned()) + }) + .expect("provider selection via env opt-in"); + assert_eq!(env_selected, ProviderSelection::ClaudeCli); + } + + #[test] + fn http_bind_is_parsed_when_config_loads() { + let cfg = McpConfig::from_yaml_str( + r#" +serve: + http: + enabled: true + bind: "127.0.0.1:0" +"#, + ) + .expect("parse HTTP bind"); + + assert_eq!(cfg.serve.http.bind, SocketAddr::from(([127, 0, 0, 1], 0))); + } + + #[test] + fn http_allow_non_loopback_defaults_false() { + assert!(!McpConfig::default().serve.http.allow_non_loopback); + } + + #[test] + fn http_allow_non_loopback_is_parsed_when_config_loads() { + let cfg = McpConfig::from_yaml_str( + r#" +serve: + http: + enabled: true + bind: "127.0.0.1:0" + allow_non_loopback: true +"#, + ) + .expect("parse HTTP allow_non_loopback"); + + assert!(cfg.serve.http.allow_non_loopback); + } + + #[test] + fn http_identity_token_env_is_parsed_when_config_loads() { + let cfg = McpConfig::from_yaml_str( + r#" +serve: + http: + enabled: true + bind: "127.0.0.1:0" + identity_token_env: CLARION_TEST_IDENTITY +"#, + ) + .expect("parse HTTP identity_token_env"); + + assert_eq!( + cfg.serve.http.identity_token_env.as_deref(), + Some("CLARION_TEST_IDENTITY") + ); + } + + #[test] + fn http_wardline_taint_write_defaults_false() { + assert!(!McpConfig::default().serve.http.wardline_taint_write); + } + + #[test] + fn http_wardline_taint_write_is_parsed_when_config_loads() { + let cfg = McpConfig::from_yaml_str( + r#" +serve: + http: + enabled: true + bind: "127.0.0.1:0" + wardline_taint_write: true +"#, + ) + .expect("parse HTTP wardline_taint_write"); + + assert!(cfg.serve.http.wardline_taint_write); + } + + #[test] + fn enabled_non_loopback_http_bind_requires_allow_non_loopback() { + let err = McpConfig::from_yaml_str( + r#" +serve: + http: + enabled: true + bind: "0.0.0.0:0" +"#, + ) + .expect_err("enabled wildcard HTTP bind should require explicit opt-in"); + + let message = err.to_string(); + assert!( + message.contains("unauthenticated non-loopback"), + "error should explain the unauthenticated non-loopback risk: {message}" + ); + assert!( + message.contains("allow_non_loopback"), + "error should name the explicit opt-in: {message}" + ); + } + + #[test] + fn enabled_lan_http_bind_requires_allow_non_loopback() { + let err = McpConfig::from_yaml_str( + r#" +serve: + http: + enabled: true + bind: "192.168.1.10:0" +"#, + ) + .expect_err("enabled LAN HTTP bind should require explicit opt-in"); + + assert!(matches!(err, ConfigError::NonLoopbackHttpBind { .. })); + } + + #[test] + fn enabled_ipv6_loopback_http_bind_is_allowed_by_default() { + let cfg = McpConfig::from_yaml_str( + r#" +serve: + http: + enabled: true + bind: "[::1]:0" +"#, + ) + .expect("IPv6 loopback HTTP bind should not require non-loopback opt-in"); + + assert!(!cfg.serve.http.allow_non_loopback); + assert!(cfg.serve.http.is_loopback_bind()); + } + + #[test] + fn enabled_non_loopback_http_bind_allows_explicit_opt_in() { + let cfg = McpConfig::from_yaml_str( + r#" +serve: + http: + enabled: true + bind: "0.0.0.0:0" + allow_non_loopback: true +"#, + ) + .expect("explicit opt-in should allow non-loopback HTTP bind"); + + assert!(cfg.serve.http.allow_non_loopback); + } + + #[test] + fn invalid_http_bind_fails_config_load() { + let err = McpConfig::from_yaml_str( + r#" +serve: + http: + enabled: true + bind: "not-a-socket" +"#, + ) + .expect_err("invalid bind should fail"); + + assert!( + err.to_string().contains("invalid serve.http.bind"), + "unexpected error: {err}" + ); + } + + #[test] + fn old_anthropic_provider_shape_reports_deprecated_provider() { + let err = McpConfig::from_yaml_str( + r" +llm: + enabled: true + provider: anthropic + anthropic_api_key_env: ANTHROPIC_API_KEY +", + ) + .expect_err("old provider shape should be rejected"); + + assert!(matches!(err, ConfigError::DeprecatedProvider { .. })); + assert!(err.to_string().contains("CLA-CONFIG-DEPRECATED-PROVIDER")); + assert!(err.to_string().contains("provider: openrouter")); + } + + #[test] + fn enabled_filigree_integration_rejects_blank_actor() { + let err = McpConfig::from_yaml_str( + r#" +integrations: + filigree: + enabled: true + actor: " " +"#, + ) + .expect_err("blank Filigree actor should be rejected"); + + assert!(err.to_string().contains("CLA-CONFIG-FILIGREE-ACTOR-BLANK")); + } +} diff --git a/crates/clarion-federation/src/filigree.rs b/crates/clarion-federation/src/filigree.rs new file mode 100644 index 00000000..bf70226e --- /dev/null +++ b/crates/clarion-federation/src/filigree.rs @@ -0,0 +1,1389 @@ +//! Filigree HTTP/MCP contract helpers for Clarion MCP. + +use std::io::{BufReader, Write}; +use std::path::{Path, PathBuf}; +use std::process::{Command, Stdio}; +use std::time::Duration; + +use clarion_core::plugin::{ContentLengthCeiling, Frame, read_frame, write_frame}; +use serde::{Deserialize, Serialize}; +use thiserror::Error; + +use crate::config::FiligreeConfig; +use crate::scan_results::{ + CleanStaleRequest, CleanStaleResponse, ScanResultsRequest, ScanResultsResponse, + clean_stale_url, parse_clean_stale_response, parse_scan_results_response, scan_results_url, +}; + +#[derive(Debug, Clone, PartialEq, Eq, Deserialize)] +pub struct EntityAssociationsResponse { + pub associations: Vec, +} + +/// The subset of a Filigree issue Clarion surfaces alongside an +/// entity-association match: enough to render the match without an agent +/// having to call back into Filigree. Sourced from `GET /api/loom/issues/{id}`. +/// Unknown fields in the response are ignored, so Filigree can grow the route +/// without breaking this read. +#[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize)] +pub struct IssueDetail { + pub title: String, + pub status: String, + pub priority: i64, +} + +/// Request Clarion sends to Filigree's observation scratchpad when an agent +/// proposes guidance. This is an observation, not a Clarion sheet. +#[derive(Debug, Clone, PartialEq, Eq, Serialize)] +pub struct ObservationCreateRequest { + pub summary: String, + pub detail: String, + #[serde(skip_serializing_if = "Option::is_none")] + pub file_path: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub line: Option, + pub priority: i64, + pub actor: String, +} + +#[derive(Debug, Clone, PartialEq, Eq, Deserialize)] +pub struct ObservationCreateResponse { + pub observation_id: String, +} + +/// Pending Filigree observation row, as read from `GET /api/loom/observations` +/// or from a test double. Unknown live fields are ignored. +#[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize)] +pub struct ObservationRecord { + pub observation_id: String, + pub summary: String, + #[serde(default)] + pub detail: String, + #[serde(default)] + pub file_path: String, + #[serde(default)] + pub line: Option, + #[serde(default)] + pub priority: i64, + #[serde(default)] + pub actor: String, +} + +#[derive(Debug, Clone, PartialEq, Eq, Deserialize)] +pub struct EntityAssociation { + pub issue_id: String, + pub clarion_entity_id: String, + pub content_hash_at_attach: String, + pub attached_at: String, + pub attached_by: String, +} + +/// One Wardline finding as Clarion surfaces it — the subset of Filigree's +/// `ScanFindingLoom` (`GET /api/loom/findings`) used for read-time +/// reconciliation. Unknown fields are ignored so Filigree can grow the row. +#[derive(Debug, Clone, PartialEq, Deserialize, Serialize)] +pub struct WardlineFinding { + pub rule_id: String, + pub message: String, + #[serde(default)] + pub severity: Option, + #[serde(default)] + pub status: Option, + #[serde(default)] + pub line_start: Option, + #[serde(default)] + pub line_end: Option, + #[serde(default)] + pub fingerprint: Option, + #[serde(default)] + pub file_id: Option, + /// The finding's `metadata` object; `metadata.wardline.qualname` is the + /// reconciliation key. Defaults to JSON null when absent. + #[serde(default)] + pub metadata: serde_json::Value, +} + +/// Envelope returned by `GET /api/loom/findings` — the paged list of +/// [`WardlineFinding`] rows Clarion reconciles against. +#[derive(Debug, Clone, PartialEq, Deserialize)] +pub struct WardlineFindingsResponse { + #[serde(default)] + pub items: Vec, + /// True when more findings pages follow. Clarion does not page the findings + /// list (the offset param is unpinned in the federation contract); when this + /// is true the first page is an incomplete view, so the caller fails closed + /// to `unavailable` rather than silently undercounting the file's findings. + #[serde(default)] + pub has_more: bool, +} + +/// One row of `GET /api/loom/files` — only the fields needed to map a path to +/// Filigree's `file_id`. +#[derive(Debug, Clone, PartialEq, Deserialize)] +pub struct LoomFileRecord { + pub file_id: String, + pub path: String, +} + +/// Envelope returned by `GET /api/loom/files` — the paged list of +/// [`LoomFileRecord`] rows Clarion uses to map a path to a `file_id`. +#[derive(Debug, Clone, PartialEq, Deserialize)] +pub struct LoomFilesResponse { + #[serde(default)] + pub items: Vec, + /// True when more pages follow. When the exact-path match is absent and + /// `has_more` is true, the result is indeterminate — the file may be on a + /// later page — so callers must degrade to `unavailable` rather than + /// concluding `no_matches`. + #[serde(default)] + pub has_more: bool, +} + +#[derive(Debug, Clone, PartialEq, Deserialize)] +pub struct LoomObservationsResponse { + #[serde(default)] + pub items: Vec, + #[serde(default)] + pub limit: Option, + #[serde(default)] + pub offset: Option, + #[serde(default)] + pub has_more: bool, +} + +pub fn parse_wardline_findings_response( + body: &str, +) -> Result { + serde_json::from_str(body).map_err(FiligreeContractError::from) +} + +pub fn parse_loom_files_response(body: &str) -> Result { + serde_json::from_str(body).map_err(FiligreeContractError::from) +} + +#[derive(Debug, Error)] +pub enum FiligreeContractError { + #[error("invalid Filigree response: {0}")] + InvalidResponse(#[from] serde_json::Error), +} + +#[derive(Debug, Error)] +pub enum FiligreeClientError { + #[error("build Filigree HTTP client: {0}")] + Build(#[source] reqwest::Error), + + #[error("request Filigree entity associations: {0}")] + Request(#[source] reqwest::Error), + + #[error("Filigree returned HTTP {status}: {body}")] + HttpStatus { status: u16, body: String }, + + #[error("POST Filigree scan-results: {0}")] + ScanResultsRequest(#[source] reqwest::Error), + + #[error("invalid Filigree scan-results response: {0}")] + InvalidScanResultsResponse(#[source] serde_json::Error), + + #[error("POST Filigree clean-stale: {0}")] + CleanStaleRequest(#[source] reqwest::Error), + + #[error("invalid Filigree clean-stale response: {0}")] + InvalidCleanStaleResponse(#[source] serde_json::Error), + + #[error("request Filigree observations: {0}")] + ObservationRequest(#[source] reqwest::Error), + + #[error("invalid Filigree observation response: {0}")] + InvalidObservationResponse(#[source] serde_json::Error), + + #[error("run Filigree MCP tool {tool}: {message}")] + McpTool { tool: String, message: String }, + + #[error(transparent)] + Contract(#[from] FiligreeContractError), +} + +pub trait FiligreeLookup: Send + Sync { + fn associations_for( + &self, + entity_id: &str, + ) -> Result; + + /// Fetch an issue's title/status/priority to enrich an association match. + /// Returns `Ok(None)` when the issue (or the detail route itself) is + /// unavailable — a `404` — so callers degrade to issue-id-only rather than + /// failing the whole `issues_for` call, per the enrich-only federation + /// axiom. The default reports the route as unavailable; the HTTP client + /// overrides it. A transport / non-404 HTTP failure is surfaced as `Err` + /// so the caller can stop hammering a down endpoint. + fn issue_detail(&self, _issue_id: &str) -> Result, FiligreeClientError> { + Ok(None) + } + + /// Wardline findings for a source file, for read-time reconciliation + /// (Flow B). Two-hop: resolve `path` -> Filigree `file_id`, then fetch that + /// file's `scan_source=wardline` findings. Returns an empty list when no + /// Wardline-touched file exists at `path`. Default impl returns empty (no + /// Filigree); the HTTP client overrides it. Transport / non-success HTTP is + /// surfaced as `Err` so the caller degrades the section to `unavailable`. + fn wardline_findings_for_path( + &self, + _path: &str, + ) -> Result, FiligreeClientError> { + Ok(Vec::new()) + } + + /// Create a pending Filigree observation. Default degrades to unavailable so + /// tests/fake clients opt in explicitly and read-only deployments cannot + /// accidentally pretend a proposal was recorded. + fn create_observation( + &self, + _request: ObservationCreateRequest, + ) -> Result { + Err(FiligreeClientError::McpTool { + tool: "observation_create".to_owned(), + message: "Filigree observation creation is unavailable".to_owned(), + }) + } + + /// Fetch one pending observation by id. Default says "not found". + fn observation_by_id( + &self, + _observation_id: &str, + ) -> Result, FiligreeClientError> { + Ok(None) + } + + /// Mark a pending observation as consumed after Clarion writes the local + /// guidance sheet. Default no-ops so promotion remains local-first if the + /// scratchpad cleanup route is unavailable. + fn dismiss_observation( + &self, + _observation_id: &str, + _reason: &str, + ) -> Result<(), FiligreeClientError> { + Ok(()) + } +} + +#[derive(Debug, Clone)] +pub struct FiligreeHttpClient { + base_url: String, + actor: String, + token: Option, + client: reqwest::blocking::Client, + project_root: Option, +} + +impl FiligreeHttpClient { + pub fn from_config( + config: &FiligreeConfig, + env_lookup: F, + ) -> Result, FiligreeClientError> + where + F: Fn(&str) -> Option, + { + Self::from_config_with_project_root(config, env_lookup, None) + } + + pub fn from_config_with_project_root( + config: &FiligreeConfig, + env_lookup: F, + project_root: Option<&Path>, + ) -> Result, FiligreeClientError> + where + F: Fn(&str) -> Option, + { + if !config.enabled { + return Ok(None); + } + let client = reqwest::blocking::Client::builder() + .timeout(Duration::from_secs(config.timeout_seconds.max(1))) + .build() + .map_err(FiligreeClientError::Build)?; + let token = env_lookup(&config.token_env).filter(|value| !value.trim().is_empty()); + Ok(Some(Self { + base_url: config.base_url.clone(), + actor: config.actor.clone(), + token, + client, + project_root: project_root.map(Path::to_path_buf), + })) + } + + /// POST a scan-results batch to Filigree's native intake (WP9-B, + /// REQ-FINDING-03). One-way Clarion→Filigree push; the caller is expected to + /// inspect [`ScanResultsResponse::warnings`] (severity coercion, unknown + /// `scan_run_id`, etc.) rather than just the counts. + /// + /// # Errors + /// + /// Returns [`FiligreeClientError::ScanResultsRequest`] on transport failure, + /// [`FiligreeClientError::HttpStatus`] on a non-success response (e.g. a + /// `400 VALIDATION` for a malformed batch), or + /// [`FiligreeClientError::InvalidScanResultsResponse`] when the body is not + /// the expected shape. + pub fn post_scan_results( + &self, + request: &ScanResultsRequest, + ) -> Result { + let mut http_request = self + .client + .post(scan_results_url(&self.base_url)) + .header("accept", "application/json") + .json(request); + if !self.actor.trim().is_empty() { + http_request = http_request.header("x-filigree-actor", self.actor.as_str()); + } + if let Some(token) = &self.token { + http_request = http_request.bearer_auth(token); + } + let response = http_request + .send() + .map_err(FiligreeClientError::ScanResultsRequest)?; + let status = response.status(); + let body = response + .text() + .map_err(FiligreeClientError::ScanResultsRequest)?; + if !status.is_success() { + return Err(FiligreeClientError::HttpStatus { + status: status.as_u16(), + body, + }); + } + parse_scan_results_response(&body).map_err(FiligreeClientError::InvalidScanResultsResponse) + } + + /// POST a retention sweep to Filigree's `clean-stale` route (REQ-FINDING-06, + /// `--prune-unseen`). One-way Clarion→Filigree call; Filigree soft-archives + /// its own `unseen_in_latest` findings for the given `scan_source`. The + /// `scan_source` scoping is enforced server-side, so this can only sweep + /// Clarion's findings. + /// + /// # Errors + /// + /// Returns [`FiligreeClientError::CleanStaleRequest`] on transport failure, + /// [`FiligreeClientError::HttpStatus`] on a non-success response, or + /// [`FiligreeClientError::InvalidCleanStaleResponse`] when the body is not + /// the expected shape. + pub fn post_clean_stale( + &self, + request: &CleanStaleRequest, + ) -> Result { + let mut http_request = self + .client + .post(clean_stale_url(&self.base_url)) + .header("accept", "application/json") + .json(request); + if !self.actor.trim().is_empty() { + http_request = http_request.header("x-filigree-actor", self.actor.as_str()); + } + if let Some(token) = &self.token { + http_request = http_request.bearer_auth(token); + } + let response = http_request + .send() + .map_err(FiligreeClientError::CleanStaleRequest)?; + let status = response.status(); + let body = response + .text() + .map_err(FiligreeClientError::CleanStaleRequest)?; + if !status.is_success() { + return Err(FiligreeClientError::HttpStatus { + status: status.as_u16(), + body, + }); + } + parse_clean_stale_response(&body).map_err(FiligreeClientError::InvalidCleanStaleResponse) + } + + /// GET `url` with the standard actor + bearer headers, returning the raw + /// (unread) response. Shared by [`get_json`](Self::get_json) and + /// [`get_json_or_none`](Self::get_json_or_none); the latter inspects the + /// status before reading the body so a `404` can short-circuit. + fn send_get(&self, url: &str) -> Result { + let mut request = self.client.get(url).header("accept", "application/json"); + if !self.actor.trim().is_empty() { + request = request.header("x-filigree-actor", self.actor.as_str()); + } + if let Some(token) = &self.token { + request = request.bearer_auth(token); + } + request.send().map_err(FiligreeClientError::Request) + } + + /// GET `url` with the standard actor + bearer headers and parse the body as + /// `T`. A non-success status is surfaced as `HttpStatus` so the caller can + /// stop hammering a down endpoint. + fn get_json( + &self, + url: &str, + ) -> Result { + let response = self.send_get(url)?; + let status = response.status(); + let body = response.text().map_err(FiligreeClientError::Request)?; + if !status.is_success() { + return Err(FiligreeClientError::HttpStatus { + status: status.as_u16(), + body, + }); + } + serde_json::from_str(&body) + .map_err(|e| FiligreeClientError::Contract(FiligreeContractError::from(e))) + } + + /// Like [`get_json`](Self::get_json) but maps a `404` to `Ok(None)` — the + /// enrich-only degrade signal for "the resource (or the route itself) is + /// absent", not an error. The body is not read on a `404`. Any other + /// non-success status is still surfaced as `HttpStatus`. + fn get_json_or_none( + &self, + url: &str, + ) -> Result, FiligreeClientError> { + let response = self.send_get(url)?; + let status = response.status(); + if status == reqwest::StatusCode::NOT_FOUND { + return Ok(None); + } + let body = response.text().map_err(FiligreeClientError::Request)?; + if !status.is_success() { + return Err(FiligreeClientError::HttpStatus { + status: status.as_u16(), + body, + }); + } + serde_json::from_str(&body) + .map(Some) + .map_err(|e| FiligreeClientError::Contract(FiligreeContractError::from(e))) + } + + fn run_mcp_tool( + &self, + tool: &str, + arguments: &serde_json::Value, + ) -> Result { + let (program, args) = resolve_filigree_mcp_command(self.project_root.as_deref()); + let mut child = Command::new(&program) + .args(&args) + .stdin(Stdio::piped()) + .stdout(Stdio::piped()) + .stderr(Stdio::piped()) + .current_dir( + self.project_root + .as_deref() + .unwrap_or_else(|| Path::new(".")), + ) + .spawn() + .map_err(|err| FiligreeClientError::McpTool { + tool: tool.to_owned(), + message: format!("spawn {program}: {err}"), + })?; + let mut stdin = child + .stdin + .take() + .ok_or_else(|| FiligreeClientError::McpTool { + tool: tool.to_owned(), + message: "child stdin unavailable".to_owned(), + })?; + let stdout = child + .stdout + .take() + .ok_or_else(|| FiligreeClientError::McpTool { + tool: tool.to_owned(), + message: "child stdout unavailable".to_owned(), + })?; + let mut stdout = BufReader::new(stdout); + + write_mcp_json( + &mut stdin, + &serde_json::json!({ + "jsonrpc": "2.0", + "id": "clarion-init", + "method": "initialize", + "params": { + "protocolVersion": "2025-11-25", + "capabilities": {}, + "clientInfo": { + "name": "clarion", + "version": env!("CARGO_PKG_VERSION") + } + } + }), + tool, + )?; + let _ = read_mcp_json(&mut stdout, "clarion-init", tool)?; + + write_mcp_json( + &mut stdin, + &serde_json::json!({ + "jsonrpc": "2.0", + "method": "notifications/initialized", + "params": {} + }), + tool, + )?; + + write_mcp_json( + &mut stdin, + &serde_json::json!({ + "jsonrpc": "2.0", + "id": "clarion-call", + "method": "tools/call", + "params": { + "name": tool, + "arguments": arguments, + } + }), + tool, + )?; + drop(stdin); + + let response = read_mcp_json(&mut stdout, "clarion-call", tool)?; + let _ = child.wait(); + if let Some(error) = response.get("error") { + return Err(FiligreeClientError::McpTool { + tool: tool.to_owned(), + message: error.to_string(), + }); + } + let text = response + .get("result") + .and_then(|result| result.get("content")) + .and_then(serde_json::Value::as_array) + .and_then(|content| content.first()) + .and_then(|item| item.get("text")) + .and_then(serde_json::Value::as_str) + .ok_or_else(|| FiligreeClientError::McpTool { + tool: tool.to_owned(), + message: format!("missing result.content[0].text in response {response}"), + })?; + let parsed: serde_json::Value = + serde_json::from_str(text).map_err(FiligreeClientError::InvalidObservationResponse)?; + if parsed.get("error").is_some() { + return Err(FiligreeClientError::McpTool { + tool: tool.to_owned(), + message: parsed.to_string(), + }); + } + Ok(parsed) + } +} + +impl FiligreeLookup for FiligreeHttpClient { + fn associations_for( + &self, + entity_id: &str, + ) -> Result { + self.get_json(&entity_associations_url(&self.base_url, entity_id)) + } + + fn issue_detail(&self, issue_id: &str) -> Result, FiligreeClientError> { + // A 404 means the issue (or the whole detail route) is absent — the + // enrich-only degrade signal, not an error — so use the `_or_none` form. + self.get_json_or_none(&issue_detail_url(&self.base_url, issue_id)) + } + + fn wardline_findings_for_path( + &self, + path: &str, + ) -> Result, FiligreeClientError> { + // Hop 1: path -> Filigree file_id. path_prefix is a prefix filter, so + // take only the row whose path is byte-exact. + let files: LoomFilesResponse = + self.get_json(&loom_files_url(&self.base_url, "wardline", path))?; + let exact = files.items.into_iter().find(|f| f.path == path); + let Some(file_id) = exact.map(|f| f.file_id) else { + // No exact match on this page. If has_more is true the result is + // indeterminate — the file may be on a later page — so degrade to + // unavailable rather than falsely concluding no_matches. + if files.has_more { + return Err(FiligreeClientError::HttpStatus { + status: 0, + body: + "loom/files truncated before exact path match; cannot conclude no findings" + .to_owned(), + }); + } + return Ok(Vec::new()); + }; + // Hop 2: file_id -> wardline findings. As with hop-1, Clarion reads only + // the first page; if it is truncated (`has_more`) the findings view is + // incomplete, so fail closed to `unavailable` rather than returning a + // silent undercount. + let findings: WardlineFindingsResponse = + self.get_json(&loom_findings_url(&self.base_url, "wardline", &file_id))?; + if findings.has_more { + return Err(FiligreeClientError::HttpStatus { + status: 0, + body: "loom/findings truncated; cannot enumerate all findings for file".to_owned(), + }); + } + Ok(findings.items) + } + + fn create_observation( + &self, + request: ObservationCreateRequest, + ) -> Result { + let mut arguments = serde_json::json!({ + "summary": request.summary, + "detail": request.detail, + "priority": request.priority, + "actor": request.actor, + }); + if let Some(obj) = arguments.as_object_mut() { + if let Some(file_path) = request.file_path { + obj.insert("file_path".to_owned(), serde_json::json!(file_path)); + } + if let Some(line) = request.line { + obj.insert("line".to_owned(), serde_json::json!(line)); + } + } + let value = self.run_mcp_tool("observation_create", &arguments)?; + serde_json::from_value(value).map_err(FiligreeClientError::InvalidObservationResponse) + } + + fn observation_by_id( + &self, + observation_id: &str, + ) -> Result, FiligreeClientError> { + let mut offset = 0_u64; + let limit = 100_u64; + loop { + let page: LoomObservationsResponse = + self.get_json(&loom_observations_url(&self.base_url, limit, offset))?; + if let Some(found) = page + .items + .into_iter() + .find(|item| item.observation_id == observation_id) + { + return Ok(Some(found)); + } + if !page.has_more { + return Ok(None); + } + offset = offset.saturating_add(limit); + } + } + + fn dismiss_observation( + &self, + observation_id: &str, + reason: &str, + ) -> Result<(), FiligreeClientError> { + let arguments = serde_json::json!({ + "observation_id": observation_id, + "reason": reason, + "actor": self.actor.clone(), + }); + let _ = self.run_mcp_tool("observation_dismiss", &arguments)?; + Ok(()) + } +} + +pub fn parse_entity_associations_response( + body: &str, +) -> Result { + serde_json::from_str(body).map_err(FiligreeContractError::from) +} + +pub fn parse_issue_detail_response(body: &str) -> Result { + serde_json::from_str(body).map_err(FiligreeContractError::from) +} + +pub fn issue_detail_url(base_url: &str, issue_id: &str) -> String { + format!( + "{}/api/loom/issues/{}", + base_url.trim_end_matches('/'), + percent_encode_query_value(issue_id) + ) +} + +pub fn entity_associations_url(base_url: &str, entity_id: &str) -> String { + format!( + "{}/api/entity-associations?entity_id={}", + base_url.trim_end_matches('/'), + percent_encode_query_value(entity_id) + ) +} + +pub fn loom_files_url(base_url: &str, scan_source: &str, path_prefix: &str) -> String { + format!( + "{}/api/loom/files?scan_source={}&path_prefix={}", + base_url.trim_end_matches('/'), + percent_encode_query_value(scan_source), + percent_encode_query_value(path_prefix) + ) +} + +pub fn loom_findings_url(base_url: &str, scan_source: &str, file_id: &str) -> String { + format!( + "{}/api/loom/findings?scan_source={}&file_id={}", + base_url.trim_end_matches('/'), + percent_encode_query_value(scan_source), + percent_encode_query_value(file_id) + ) +} + +pub fn loom_observations_url(base_url: &str, limit: u64, offset: u64) -> String { + format!( + "{}/api/loom/observations?limit={}&offset={}", + base_url.trim_end_matches('/'), + limit, + offset + ) +} + +fn write_mcp_json( + writer: &mut impl Write, + value: &serde_json::Value, + tool: &str, +) -> Result<(), FiligreeClientError> { + let body = serde_json::to_vec(value).map_err(|err| FiligreeClientError::McpTool { + tool: tool.to_owned(), + message: format!("serialize MCP request: {err}"), + })?; + write_frame(writer, &Frame { body }).map_err(|err| FiligreeClientError::McpTool { + tool: tool.to_owned(), + message: format!("write MCP frame: {err}"), + }) +} + +fn read_mcp_json( + reader: &mut impl std::io::BufRead, + expected_id: &str, + tool: &str, +) -> Result { + loop { + let frame = read_frame(reader, ContentLengthCeiling::DEFAULT).map_err(|err| { + FiligreeClientError::McpTool { + tool: tool.to_owned(), + message: format!("read MCP frame: {err}"), + } + })?; + let value: serde_json::Value = + serde_json::from_slice(&frame.body).map_err(|err| FiligreeClientError::McpTool { + tool: tool.to_owned(), + message: format!("parse MCP response: {err}"), + })?; + if value + .get("id") + .and_then(serde_json::Value::as_str) + .is_some_and(|id| id == expected_id) + { + return Ok(value); + } + } +} + +fn resolve_filigree_mcp_command(project_root: Option<&Path>) -> (String, Vec) { + if let Ok(raw) = std::env::var("CLARION_FILIGREE_MCP_COMMAND") { + let mut parts: Vec = raw + .split_whitespace() + .map(|part| replace_project_placeholder(part, project_root)) + .collect(); + if let Some(program) = parts.first().cloned() { + parts.remove(0); + return (program, parts); + } + } + + let mut status_cmd = Command::new("filigree"); + status_cmd.args(["mcp-status", "--json"]); + if let Some(root) = project_root { + status_cmd.current_dir(root); + } + if let Ok(output) = status_cmd.output() + && output.status.success() + && let Ok(status) = serde_json::from_slice::(&output.stdout) + && let Some(python) = status + .get("runtime") + .and_then(|runtime| runtime.get("python_executable")) + .and_then(serde_json::Value::as_str) + { + let mut args = vec!["-m".to_owned(), "filigree.mcp_server".to_owned()]; + if let Some(root) = project_root { + args.push("--project".to_owned()); + args.push(root.display().to_string()); + } + return (python.to_owned(), args); + } + + ("filigree".to_owned(), vec!["mcp".to_owned()]) +} + +fn replace_project_placeholder(raw: &str, project_root: Option<&Path>) -> String { + match project_root { + Some(root) => raw.replace("{project}", &root.display().to_string()), + None => raw.to_owned(), + } +} + +fn percent_encode_query_value(raw: &str) -> String { + let mut encoded = String::new(); + for byte in raw.bytes() { + match byte { + b'A'..=b'Z' | b'a'..=b'z' | b'0'..=b'9' | b'-' | b'.' | b'_' | b'~' => { + encoded.push(char::from(byte)); + } + _ => { + encoded.push('%'); + encoded.push(hex_digit(byte >> 4)); + encoded.push(hex_digit(byte & 0x0f)); + } + } + } + encoded +} + +fn hex_digit(value: u8) -> char { + match value { + 0..=9 => char::from(b'0' + value), + 10..=15 => char::from(b'A' + (value - 10)), + _ => unreachable!("nibble is always <= 15"), + } +} + +#[cfg(test)] +mod tests { + use super::*; + use std::io::{Read, Write}; + use std::net::TcpListener; + + #[test] + fn parses_reverse_entity_association_response_shape() { + let parsed = parse_entity_associations_response( + r#"{ + "associations": [ + { + "issue_id": "filigree-1234567890", + "clarion_entity_id": "python:function:demo.hello", + "content_hash_at_attach": "hash-a", + "attached_at": "2026-05-17T00:00:00.000Z", + "attached_by": "codex" + } + ] + }"#, + ) + .expect("parse Filigree reverse route response"); + + assert_eq!(parsed.associations.len(), 1); + let row = &parsed.associations[0]; + assert_eq!(row.issue_id, "filigree-1234567890"); + assert_eq!(row.clarion_entity_id, "python:function:demo.hello"); + assert_eq!(row.content_hash_at_attach, "hash-a"); + assert_eq!(row.attached_at, "2026-05-17T00:00:00.000Z"); + assert_eq!(row.attached_by, "codex"); + } + + #[test] + fn builds_reverse_route_url_with_encoded_entity_id() { + let url = entity_associations_url("http://127.0.0.1:8766/", "python:function:demo.hello"); + + assert_eq!( + url, + "http://127.0.0.1:8766/api/entity-associations?entity_id=python%3Afunction%3Ademo.hello" + ); + } + + #[test] + fn http_client_hits_reverse_route_with_actor_and_bearer_headers() { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + let handle = std::thread::spawn(move || { + let (mut stream, _) = listener.accept().expect("accept request"); + let mut request = [0_u8; 4096]; + let read = stream.read(&mut request).expect("read request"); + let request = String::from_utf8_lossy(&request[..read]); + assert!(request.contains( + "GET /api/entity-associations?entity_id=python%3Afunction%3Ademo.hello HTTP/1.1" + )); + assert!(request.contains("x-filigree-actor: clarion-test")); + assert!(request.contains("authorization: Bearer secret-token")); + + let body = r#"{"associations":[{"issue_id":"filigree-1234567890","clarion_entity_id":"python:function:demo.hello","content_hash_at_attach":"hash-a","attached_at":"2026-05-17T00:00:00.000Z","attached_by":"codex"}]}"#; + write!( + stream, + "HTTP/1.1 200 OK\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", + body.len(), + body + ) + .expect("write response"); + }); + let config = FiligreeConfig { + enabled: true, + base_url: format!("http://{addr}"), + actor: "clarion-test".to_owned(), + token_env: "TEST_FILIGREE_TOKEN".to_owned(), + timeout_seconds: 1, + emit_findings: true, + prune_unseen_days: 30, + }; + let client = FiligreeHttpClient::from_config(&config, |name| { + (name == "TEST_FILIGREE_TOKEN").then(|| "secret-token".to_owned()) + }) + .expect("build client") + .expect("enabled client"); + + let response = client + .associations_for("python:function:demo.hello") + .expect("fetch associations"); + + assert_eq!(response.associations[0].issue_id, "filigree-1234567890"); + handle.join().expect("server thread"); + } + + #[test] + fn parses_issue_detail_response_shape() { + let parsed = parse_issue_detail_response( + r#"{ + "issue_id": "clarion-51a2868c86", + "title": "issues_for: enrich matches", + "status": "proposed", + "status_category": "open", + "priority": 3, + "type": "feature" + }"#, + ) + .expect("parse issue detail"); + assert_eq!(parsed.title, "issues_for: enrich matches"); + assert_eq!(parsed.status, "proposed"); + assert_eq!(parsed.priority, 3); + } + + #[test] + fn builds_issue_detail_url_with_encoded_id() { + let url = issue_detail_url("http://127.0.0.1:8542/", "clarion-51a2868c86"); + assert_eq!( + url, + "http://127.0.0.1:8542/api/loom/issues/clarion-51a2868c86" + ); + } + + #[test] + fn issue_detail_http_client_parses_200() { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + let handle = std::thread::spawn(move || { + let (mut stream, _) = listener.accept().expect("accept request"); + let mut request = [0_u8; 4096]; + let read = stream.read(&mut request).expect("read request"); + let request = String::from_utf8_lossy(&request[..read]); + assert!(request.contains("GET /api/loom/issues/clarion-51a2868c86 HTTP/1.1")); + + let body = r#"{"issue_id":"clarion-51a2868c86","title":"enrich","status":"proposed","priority":3}"#; + write!( + stream, + "HTTP/1.1 200 OK\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", + body.len(), + body + ) + .expect("write response"); + }); + let client = detail_test_client(addr); + let detail = client + .issue_detail("clarion-51a2868c86") + .expect("issue detail request") + .expect("issue present"); + assert_eq!(detail.title, "enrich"); + assert_eq!(detail.status, "proposed"); + assert_eq!(detail.priority, 3); + handle.join().expect("server thread"); + } + + #[test] + fn issue_detail_http_client_maps_404_to_none() { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + let handle = std::thread::spawn(move || { + let (mut stream, _) = listener.accept().expect("accept request"); + let mut request = [0_u8; 4096]; + let _ = stream.read(&mut request).expect("read request"); + let body = r#"{"error":"Not Found","code":"NOT_FOUND"}"#; + write!( + stream, + "HTTP/1.1 404 Not Found\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", + body.len(), + body + ) + .expect("write response"); + }); + let client = detail_test_client(addr); + let detail = client + .issue_detail("clarion-missing") + .expect("404 is Ok(None), not an error"); + assert!(detail.is_none(), "404 degrades to None: {detail:?}"); + handle.join().expect("server thread"); + } + + #[test] + fn post_scan_results_sends_batch_and_parses_response() { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + let handle = std::thread::spawn(move || { + let (mut stream, _) = listener.accept().expect("accept request"); + let mut request = [0_u8; 8192]; + let read = stream.read(&mut request).expect("read request"); + let request = String::from_utf8_lossy(&request[..read]); + assert!( + request.contains("POST /api/v1/scan-results HTTP/1.1"), + "request line: {request}" + ); + assert!(request.contains("x-filigree-actor: clarion-test")); + assert!(request.contains("authorization: Bearer secret-token")); + // The wire body carries the mapped severity, not the internal one. + assert!( + request.contains("\"scan_source\":\"clarion\""), + "body: {request}" + ); + assert!( + request.contains("\"severity\":\"medium\""), + "body: {request}" + ); + assert!( + request.contains("\"internal_severity\":\"WARN\""), + "body: {request}" + ); + + let body = r#"{"files_created":1,"files_updated":0,"findings_created":1,"findings_updated":0,"new_finding_ids":["clarion-sf-abc"],"observations_created":0,"observations_failed":0,"warnings":["Scan run run-1 status not updated to 'completed': not found"]}"#; + write!( + stream, + "HTTP/1.1 200 OK\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", + body.len(), + body + ) + .expect("write response"); + }); + let config = FiligreeConfig { + enabled: true, + base_url: format!("http://{addr}"), + actor: "clarion-test".to_owned(), + token_env: "TEST_FILIGREE_TOKEN".to_owned(), + timeout_seconds: 1, + emit_findings: true, + prune_unseen_days: 30, + }; + let client = FiligreeHttpClient::from_config(&config, |name| { + (name == "TEST_FILIGREE_TOKEN").then(|| "secret-token".to_owned()) + }) + .expect("build client") + .expect("enabled client"); + + let row = clarion_storage::FindingForEmitRow { + id: "core:finding:run-1:circular".to_owned(), + rule_id: "CLA-PY-STRUCTURE-001".to_owned(), + kind: "defect".to_owned(), + severity: "WARN".to_owned(), + confidence: Some(0.9), + confidence_basis: None, + message: "Circular import".to_owned(), + entity_id: "python:class:auth.tokens::TokenManager".to_owned(), + related_entities_json: "[]".to_owned(), + supports_json: "[]".to_owned(), + supported_by_json: "[]".to_owned(), + source_file_path: Some("src/auth/tokens.py".to_owned()), + source_line_start: Some(12), + source_line_end: Some(12), + }; + let batch = crate::scan_results::prepare_batch( + &[row], + &crate::scan_results::EmitOptions { + scan_run_id: Some("run-1".to_owned()), + mark_unseen: true, + complete_scan_run: true, + default_path: None, + }, + ); + + let response = client + .post_scan_results(&batch.request) + .expect("post scan results"); + assert_eq!(response.findings_created, 1); + assert_eq!(response.new_finding_ids, vec!["clarion-sf-abc"]); + assert_eq!(response.warnings.len(), 1); + handle.join().expect("server thread"); + } + + #[test] + fn post_scan_results_surfaces_validation_error_as_http_status() { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + let handle = std::thread::spawn(move || { + let (mut stream, _) = listener.accept().expect("accept request"); + let mut request = [0_u8; 8192]; + let _ = stream.read(&mut request).expect("read request"); + let body = + r#"{"error":"findings[0] is missing required key 'path'","code":"VALIDATION"}"#; + write!( + stream, + "HTTP/1.1 400 Bad Request\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", + body.len(), + body + ) + .expect("write response"); + }); + let client = detail_test_client(addr); + let batch = crate::scan_results::prepare_batch( + &[], + &crate::scan_results::EmitOptions { + scan_run_id: None, + mark_unseen: true, + complete_scan_run: true, + default_path: None, + }, + ); + let err = client + .post_scan_results(&batch.request) + .expect_err("400 surfaces as error"); + match err { + FiligreeClientError::HttpStatus { status, .. } => assert_eq!(status, 400), + other => panic!("expected HttpStatus, got {other:?}"), + } + handle.join().expect("server thread"); + } + + #[test] + fn parses_loom_findings_list_envelope() { + let resp = parse_wardline_findings_response( + r#"{"items":[ + {"finding_id":"f-1","file_id":"file-9","severity":"high","status":"open", + "scan_source":"wardline","rule_id":"WLN-TAINT-001","message":"tainted sink", + "suggestion":"","scan_run_id":"r-1","line_start":12,"line_end":12, + "fingerprint":"fp-abc","issue_id":null,"seen_count":1, + "metadata":{"wardline":{"qualname":"demo.Foo.bar","kind":"DEFECT"}}, + "data_warnings":[]} + ],"has_more":false}"#, + ) + .expect("parse findings list"); + assert_eq!(resp.items.len(), 1); + let f = &resp.items[0]; + assert_eq!(f.rule_id, "WLN-TAINT-001"); + assert_eq!(f.fingerprint.as_deref(), Some("fp-abc")); + assert_eq!(f.line_start, Some(12)); + assert_eq!( + f.metadata + .get("wardline") + .and_then(|w| w.get("qualname")) + .and_then(|q| q.as_str()), + Some("demo.Foo.bar") + ); + } + + #[test] + fn parses_loom_files_list_envelope() { + let resp = parse_loom_files_response( + r#"{"items":[ + {"file_id":"file-9","path":"src/demo.py","language":"python","file_type":"source"}, + {"file_id":"file-10","path":"src/demo_helpers.py","language":"python","file_type":"source"} + ],"has_more":false}"#, + ) + .expect("parse files list"); + assert_eq!(resp.items.len(), 2); + assert_eq!(resp.items[0].file_id, "file-9"); + assert_eq!(resp.items[0].path, "src/demo.py"); + } + + #[test] + fn builds_loom_url_builders_with_encoding() { + assert_eq!( + loom_files_url("http://127.0.0.1:8542/", "wardline", "src/demo.py"), + "http://127.0.0.1:8542/api/loom/files?scan_source=wardline&path_prefix=src%2Fdemo.py" + ); + assert_eq!( + loom_findings_url("http://127.0.0.1:8542/", "wardline", "file-9"), + "http://127.0.0.1:8542/api/loom/findings?scan_source=wardline&file_id=file-9" + ); + } + + #[test] + fn wardline_findings_for_path_does_two_hops_and_exact_path_filter() { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + let handle = std::thread::spawn(move || { + // Hop 1: GET /api/loom/files — path_prefix matches two files; the + // exact-path filter must pick file-9, not the helpers file. + let (mut s1, _) = listener.accept().expect("accept files"); + let mut buf = [0_u8; 4096]; + let n = s1.read(&mut buf).expect("read files req"); + let req = String::from_utf8_lossy(&buf[..n]); + assert!(req.contains( + "GET /api/loom/files?scan_source=wardline&path_prefix=src%2Fdemo.py HTTP/1.1" + )); + let body = r#"{"items":[{"file_id":"file-9","path":"src/demo.py","language":"python","file_type":"source"},{"file_id":"file-10","path":"src/demo.py.bak","language":"python","file_type":"source"}],"has_more":false}"#; + // connection: close forces reqwest to open a fresh TCP connection for + // hop 2, so the listener's second accept() receives it (the blocking + // client would otherwise pool/reuse hop-1's socket and hop-2's + // accept() would hang). + write!( + s1, + "HTTP/1.1 200 OK\r\nconnection: close\r\ncontent-length: {}\r\n\r\n{}", + body.len(), + body + ) + .unwrap(); + + // Hop 2: GET /api/loom/findings for file-9. + let (mut s2, _) = listener.accept().expect("accept findings"); + let n = s2.read(&mut buf).expect("read findings req"); + let req = String::from_utf8_lossy(&buf[..n]); + assert!( + req.contains("GET /api/loom/findings?scan_source=wardline&file_id=file-9 HTTP/1.1") + ); + let body = r#"{"items":[{"finding_id":"f-1","file_id":"file-9","severity":"high","status":"open","scan_source":"wardline","rule_id":"WLN-TAINT-001","message":"sink","suggestion":"","scan_run_id":"r-1","line_start":12,"line_end":12,"fingerprint":"fp","issue_id":null,"seen_count":1,"metadata":{"wardline":{"qualname":"demo.Foo.bar"}},"data_warnings":[]}],"has_more":false}"#; + write!( + s2, + "HTTP/1.1 200 OK\r\ncontent-length: {}\r\n\r\n{}", + body.len(), + body + ) + .unwrap(); + }); + // Not detail_test_client(addr): the two-hop test does two sequential TCP + // accepts, so use a more generous timeout to avoid CI scheduling jitter + // between hops. + let config = FiligreeConfig { + enabled: true, + base_url: format!("http://{addr}"), + actor: "clarion-test".to_owned(), + token_env: "TEST_FILIGREE_TOKEN".to_owned(), + timeout_seconds: 5, + emit_findings: true, + prune_unseen_days: 30, + }; + let client = FiligreeHttpClient::from_config(&config, |_| None) + .expect("build client") + .expect("enabled client"); + let findings = client + .wardline_findings_for_path("src/demo.py") + .expect("two-hop fetch"); + assert_eq!(findings.len(), 1); + assert_eq!(findings[0].rule_id, "WLN-TAINT-001"); + handle.join().expect("server thread"); + } + + /// FIX 3: when hop-1 returns items that don't include the exact path AND + /// `has_more` is true, `wardline_findings_for_path` must return `Err` rather + /// than `Ok(empty)` — a truncated page is indeterminate, not "no file found". + #[test] + fn wardline_findings_for_path_errors_when_hop1_truncated_before_exact_match() { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + let handle = std::thread::spawn(move || { + // Hop 1: page does NOT contain src/demo.py but has_more is true — + // the exact path may be on a later page. + let (mut s1, _) = listener.accept().expect("accept files"); + let mut buf = [0_u8; 4096]; + let n = s1.read(&mut buf).expect("read files req"); + let req = String::from_utf8_lossy(&buf[..n]); + assert!(req.contains( + "GET /api/loom/files?scan_source=wardline&path_prefix=src%2Fdemo.py HTTP/1.1" + )); + // Return a page that omits the target path with has_more:true. + let body = r#"{"items":[{"file_id":"file-1","path":"src/demo_other.py","language":"python","file_type":"source"}],"has_more":true}"#; + write!( + s1, + "HTTP/1.1 200 OK\r\nconnection: close\r\ncontent-length: {}\r\n\r\n{}", + body.len(), + body + ) + .unwrap(); + // No hop-2 — the function must error before making a second request. + }); + let config = FiligreeConfig { + enabled: true, + base_url: format!("http://{addr}"), + actor: "clarion-test".to_owned(), + token_env: "TEST_FILIGREE_TOKEN".to_owned(), + timeout_seconds: 5, + emit_findings: true, + prune_unseen_days: 30, + }; + let client = FiligreeHttpClient::from_config(&config, |_| None) + .expect("build client") + .expect("enabled client"); + let result = client.wardline_findings_for_path("src/demo.py"); + handle.join().expect("server thread"); + assert!( + result.is_err(), + "truncated hop-1 without exact match must be Err, not Ok: {result:?}" + ); + } + + /// Hop-2 counterpart to the hop-1 truncation test: when the findings page + /// for the resolved `file_id` reports `has_more: true`, the first page is an + /// incomplete view, so `wardline_findings_for_path` must return `Err` + /// (degrades to `unavailable`) rather than `Ok(partial)` — no silent + /// undercount. + #[test] + fn wardline_findings_for_path_errors_when_hop2_truncated() { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + let handle = std::thread::spawn(move || { + // Hop 1: exact path resolves to file-9 on a complete page. + let (mut s1, _) = listener.accept().expect("accept files"); + let mut buf = [0_u8; 4096]; + let n = s1.read(&mut buf).expect("read files req"); + let req = String::from_utf8_lossy(&buf[..n]); + assert!(req.contains( + "GET /api/loom/files?scan_source=wardline&path_prefix=src%2Fdemo.py HTTP/1.1" + )); + let body = r#"{"items":[{"file_id":"file-9","path":"src/demo.py","language":"python","file_type":"source"}],"has_more":false}"#; + write!( + s1, + "HTTP/1.1 200 OK\r\nconnection: close\r\ncontent-length: {}\r\n\r\n{}", + body.len(), + body + ) + .unwrap(); + + // Hop 2: findings page for file-9 is truncated (has_more:true). + let (mut s2, _) = listener.accept().expect("accept findings"); + let n = s2.read(&mut buf).expect("read findings req"); + let req = String::from_utf8_lossy(&buf[..n]); + assert!( + req.contains("GET /api/loom/findings?scan_source=wardline&file_id=file-9 HTTP/1.1") + ); + let body = r#"{"items":[{"finding_id":"f-1","file_id":"file-9","severity":"high","status":"open","scan_source":"wardline","rule_id":"WLN-TAINT-001","message":"sink","suggestion":"","scan_run_id":"r-1","line_start":12,"line_end":12,"fingerprint":"fp","issue_id":null,"seen_count":1,"metadata":{"wardline":{"qualname":"demo.Foo.bar"}},"data_warnings":[]}],"has_more":true}"#; + write!( + s2, + "HTTP/1.1 200 OK\r\ncontent-length: {}\r\n\r\n{}", + body.len(), + body + ) + .unwrap(); + }); + let config = FiligreeConfig { + enabled: true, + base_url: format!("http://{addr}"), + actor: "clarion-test".to_owned(), + token_env: "TEST_FILIGREE_TOKEN".to_owned(), + timeout_seconds: 5, + emit_findings: true, + prune_unseen_days: 30, + }; + let client = FiligreeHttpClient::from_config(&config, |_| None) + .expect("build client") + .expect("enabled client"); + let result = client.wardline_findings_for_path("src/demo.py"); + handle.join().expect("server thread"); + assert!( + result.is_err(), + "truncated hop-2 findings page must be Err, not Ok(partial): {result:?}" + ); + } + + fn detail_test_client(addr: std::net::SocketAddr) -> FiligreeHttpClient { + let config = FiligreeConfig { + enabled: true, + base_url: format!("http://{addr}"), + actor: "clarion-test".to_owned(), + token_env: "TEST_FILIGREE_TOKEN".to_owned(), + timeout_seconds: 1, + emit_findings: true, + prune_unseen_days: 30, + }; + FiligreeHttpClient::from_config(&config, |_| None) + .expect("build client") + .expect("enabled client") + } +} diff --git a/crates/clarion-federation/src/filigree_url.rs b/crates/clarion-federation/src/filigree_url.rs new file mode 100644 index 00000000..a8e90cbb --- /dev/null +++ b/crates/clarion-federation/src/filigree_url.rs @@ -0,0 +1,212 @@ +//! Resolve the live Filigree API base URL. +//! +//! Mirrors Filigree's ethereal endpoint-discovery convention: the dashboard +//! publishes its live port to `/.filigree/ephemeral.port` (a plain +//! integer, written atomically, present only while the dashboard runs) and +//! serves the read API on that port. The port is chosen deterministically but +//! unpredictably (`8400 + sha256(path) % 1000` with fallback), so it must be +//! *read*, never computed. This mirrors the Filigree sources: +//! - `filigree/src/filigree/ephemeral.py::{write,read}_port_file` +//! - `filigree/src/filigree/scanner_callback.py::resolve_scanner_api_url_with_source` +//! +//! Federation discipline (`docs/suite/loom.md` §5): this is enrich-only +//! connection discovery. Clarion stays solo-useful — when no live port file is +//! present (or Filigree is disabled) Clarion falls back to its *own* configured +//! `base_url`, never to a Filigree-internal default (copying Filigree's +//! `DEFAULT_PORT` would be a silent cross-product coupling). Reading the port +//! file is fail-soft: any missing/corrupt/out-of-range content degrades to the +//! configured URL. +//! +//! Scope: ethereal mode only. Filigree's `server` mode resolves through a +//! home-directory global (`~/.config/filigree/server.json`); that path is not +//! exercised here and is left as a known gap (clarion-318f1254eb tracks the +//! issues_for-side resolution diagnostics that build on this resolver). + +use std::path::Path; + +use serde::Serialize; + +use crate::config::FiligreeConfig; + +/// Wire-facing `source` labels for a resolved Filigree URL. Reported verbatim +/// by `project_status` (and, per clarion-318f1254eb, `issues_for`) so an agent +/// can tell *where* the URL came from without shelling out to probe ports. +pub const SOURCE_DISABLED: &str = "disabled"; +/// The live ethereal port published by Filigree's running dashboard. +pub const SOURCE_EPHEMERAL_PORT: &str = ".filigree/ephemeral.port"; +/// Clarion's own configured `integrations.filigree.base_url`. +pub const SOURCE_CONFIG: &str = "config"; + +/// The outcome of resolving where Clarion should reach Filigree's read API. +#[derive(Debug, Clone, PartialEq, Eq, Serialize)] +pub struct FiligreeUrlResolution { + /// Whether the Filigree integration is enabled in config at all. + pub enabled: bool, + /// The statically configured base URL (`integrations.filigree.base_url`). + pub configured_url: String, + /// The URL Clarion will actually call. `None` only when disabled. + pub resolved_url: Option, + /// Which input produced [`Self::resolved_url`]; one of the `SOURCE_*` labels. + pub source: &'static str, +} + +/// Resolve the Filigree read-API base URL, preferring the live ethereal port. +/// +/// - Disabled → no resolved URL, `source = "disabled"`. +/// - A valid `/.filigree/ephemeral.port` → the configured URL +/// with its port overridden by the live port, `source = ".filigree/ephemeral.port"`. +/// - Otherwise → the configured URL unchanged, `source = "config"`. +#[must_use] +pub fn resolve_filigree_url(config: &FiligreeConfig, project_root: &Path) -> FiligreeUrlResolution { + let configured_url = config.base_url.clone(); + if !config.enabled { + return FiligreeUrlResolution { + enabled: false, + configured_url, + resolved_url: None, + source: SOURCE_DISABLED, + }; + } + match read_ephemeral_port(project_root) { + Some(port) => { + let resolved = override_port(&configured_url, port); + FiligreeUrlResolution { + enabled: true, + configured_url, + resolved_url: Some(resolved), + source: SOURCE_EPHEMERAL_PORT, + } + } + None => FiligreeUrlResolution { + enabled: true, + resolved_url: Some(configured_url.clone()), + configured_url, + source: SOURCE_CONFIG, + }, + } +} + +/// Read `/.filigree/ephemeral.port` as a TCP port. +/// +/// Mirrors Filigree's `read_port_file`: a plain trimmed integer. Any +/// missing/corrupt/out-of-range/zero content folds to `None` (fail-soft). +fn read_ephemeral_port(project_root: &Path) -> Option { + let path = project_root.join(".filigree").join("ephemeral.port"); + let raw = std::fs::read_to_string(&path).ok()?; + raw.trim().parse::().ok().filter(|port| *port != 0) +} + +/// Replace the port in a `scheme://host[:port][/path]` URL, preserving the +/// scheme, host, and any trailing path. Returns the input unchanged when it +/// has no recognizable `scheme://` authority. IPv6 literal hosts are out of +/// scope — Filigree binds `127.0.0.1`. +fn override_port(base_url: &str, port: u16) -> String { + let Some((scheme, rest)) = base_url.split_once("://") else { + return base_url.to_owned(); + }; + let (authority, path) = match rest.find('/') { + Some(slash) => (&rest[..slash], &rest[slash..]), + None => (rest, ""), + }; + // Strip an existing `:port` suffix, but only when it is genuinely a numeric + // port (so a bare `host` with no port is preserved intact). + let host = match authority.rsplit_once(':') { + Some((host, maybe_port)) + if !maybe_port.is_empty() && maybe_port.bytes().all(|b| b.is_ascii_digit()) => + { + host + } + _ => authority, + }; + format!("{scheme}://{host}:{port}{path}") +} + +#[cfg(test)] +mod tests { + use super::*; + + fn enabled_config() -> FiligreeConfig { + FiligreeConfig { + enabled: true, + ..FiligreeConfig::default() + } + } + + fn write_port_file(root: &Path, contents: &str) { + let dir = root.join(".filigree"); + std::fs::create_dir_all(&dir).unwrap(); + std::fs::write(dir.join("ephemeral.port"), contents).unwrap(); + } + + #[test] + fn disabled_integration_resolves_nothing() { + let dir = tempfile::tempdir().unwrap(); + let config = FiligreeConfig::default(); // enabled: false + let res = resolve_filigree_url(&config, dir.path()); + assert!(!res.enabled); + assert_eq!(res.resolved_url, None); + assert_eq!(res.source, SOURCE_DISABLED); + assert_eq!(res.configured_url, "http://127.0.0.1:8766"); + } + + #[test] + fn live_ephemeral_port_overrides_the_stale_configured_port() { + // The dogfood bug: configured 8766 is dead; the live dashboard is on + // 8542 per .filigree/ephemeral.port. + let dir = tempfile::tempdir().unwrap(); + write_port_file(dir.path(), "8542\n"); + let res = resolve_filigree_url(&enabled_config(), dir.path()); + assert!(res.enabled); + assert_eq!(res.resolved_url.as_deref(), Some("http://127.0.0.1:8542")); + assert_eq!(res.source, SOURCE_EPHEMERAL_PORT); + // The configured URL is still reported verbatim alongside the resolved one. + assert_eq!(res.configured_url, "http://127.0.0.1:8766"); + } + + #[test] + fn falls_back_to_configured_url_when_no_port_file() { + let dir = tempfile::tempdir().unwrap(); + let res = resolve_filigree_url(&enabled_config(), dir.path()); + assert!(res.enabled); + assert_eq!(res.resolved_url.as_deref(), Some("http://127.0.0.1:8766")); + assert_eq!(res.source, SOURCE_CONFIG); + } + + #[test] + fn corrupt_port_file_folds_to_configured_url() { + let dir = tempfile::tempdir().unwrap(); + write_port_file(dir.path(), "not-a-port"); + let res = resolve_filigree_url(&enabled_config(), dir.path()); + assert_eq!(res.source, SOURCE_CONFIG); + assert_eq!(res.resolved_url.as_deref(), Some("http://127.0.0.1:8766")); + } + + #[test] + fn zero_port_is_rejected_as_corrupt() { + let dir = tempfile::tempdir().unwrap(); + write_port_file(dir.path(), "0"); + let res = resolve_filigree_url(&enabled_config(), dir.path()); + assert_eq!(res.source, SOURCE_CONFIG); + } + + #[test] + fn override_port_preserves_scheme_host_and_path() { + assert_eq!( + override_port("http://127.0.0.1:8766", 8542), + "http://127.0.0.1:8542" + ); + assert_eq!( + override_port("http://localhost", 8542), + "http://localhost:8542" + ); + assert_eq!( + override_port("https://example.test:1/api", 8542), + "https://example.test:8542/api" + ); + } + + #[test] + fn override_port_returns_input_without_scheme() { + assert_eq!(override_port("127.0.0.1:8766", 8542), "127.0.0.1:8766"); + } +} diff --git a/crates/clarion-federation/src/lib.rs b/crates/clarion-federation/src/lib.rs new file mode 100644 index 00000000..43993c83 --- /dev/null +++ b/crates/clarion-federation/src/lib.rs @@ -0,0 +1,6 @@ +//! Shared federation/config helpers used by CLI and MCP surfaces. + +pub mod config; +pub mod filigree; +pub mod filigree_url; +pub mod scan_results; diff --git a/crates/clarion-federation/src/scan_results.rs b/crates/clarion-federation/src/scan_results.rs new file mode 100644 index 00000000..99361dd3 --- /dev/null +++ b/crates/clarion-federation/src/scan_results.rs @@ -0,0 +1,607 @@ +//! Filigree-native scan-results emission (WP9-B, REQ-FINDING-03). +//! +//! Maps Clarion's persisted findings onto Filigree's `POST /api/v1/scan-results` +//! intake schema (ADR-004 + detailed-design §7) and models the response. This +//! module is pure — request building and response parsing only; the HTTP POST +//! lives on [`crate::filigree::FiligreeHttpClient::post_scan_results`]. +//! +//! Emission is enrich-only: a one-way Clarion→Filigree push that adds no +//! Filigree-side routes and never gates Clarion's own semantics. Clarion's +//! richer fields nest under `metadata.clarion.*` so Filigree's silent +//! top-level-key drop (verified against the live intake) cannot lose them. + +use serde::{Deserialize, Serialize}; +use serde_json::{Map, Value, json}; + +use clarion_storage::FindingForEmitRow; + +/// The `scan_source` Clarion stamps on every emitted finding. Filigree's dedup +/// key includes `scan_source`, so this is stable across runs. +pub const CLARION_SCAN_SOURCE: &str = "clarion"; + +/// Map Clarion's internal severity vocabulary (`INFO` | `WARN` | `ERROR` | +/// `CRITICAL` | `NONE`) to Filigree's wire vocabulary (detailed-design §7 +/// table). Anything unrecognised — including `NONE` (facts) and `INFO` — maps +/// to `info`, mirroring the coercion Filigree applies server-side, except done +/// here so the original survives in `metadata.clarion.internal_severity`. +/// +/// This mapping is load-bearing: a live probe confirmed Filigree coerces an +/// unmapped uppercase `WARN` to `info` (with a response warning), so emitting +/// the internal vocabulary verbatim would silently flatten every defect to +/// `info`. +#[must_use] +pub fn severity_to_wire(internal: &str) -> &'static str { + match internal { + "CRITICAL" => "critical", + "ERROR" => "high", + "WARN" => "medium", + _ => "info", + } +} + +/// Knobs the emitter sets per `clarion analyze` invocation. `create_observations` +/// is always `false` (Clarion emits findings, not observations). +#[derive(Debug, Clone)] +pub struct EmitOptions { + /// Filigree's `scan_run_id`; Clarion passes its `run_id` here. An unknown + /// id is tolerated by Filigree (it warns and proceeds), so this carries the + /// REQ-FINDING-05 wire shape without a pre-create handshake. + pub scan_run_id: Option, + /// `mark_unseen`: `true` for a normal full run so old-position findings for + /// the same rule/file transition to `unseen_in_latest` (REQ-FINDING-06). + pub mark_unseen: bool, + /// `complete_scan_run`: `true` on the final (here: only) batch. + pub complete_scan_run: bool, + /// Fallback `path` for findings whose anchor entity has no `source_file_path` + /// (synthetic, non-file entities — subsystems, project, guidance). Filigree + /// rejects path-less findings, so when this is set such a finding emits + /// against this stand-in path (the project root, mirroring the + /// `core:project:*` finding anchor) and carries + /// `metadata.clarion.synthetic_anchor=true` so a consumer knows the path is a + /// placeholder for a non-file entity, not the finding's real location. When + /// `None`, path-less findings are skipped (`skipped_no_path`) as before. + pub default_path: Option, +} + +/// The Filigree-native scan-results request body. Serializes to the exact wire +/// shape Filigree's intake accepts; any field outside its enumerated set is +/// silently dropped server-side, so the struct carries only known keys. +#[derive(Debug, Clone, PartialEq, Serialize)] +pub struct ScanResultsRequest { + pub scan_source: String, + #[serde(skip_serializing_if = "Option::is_none")] + pub scan_run_id: Option, + pub mark_unseen: bool, + pub create_observations: bool, + pub complete_scan_run: bool, + pub findings: Vec, +} + +/// A prepared batch plus the counts the emitter records in `stats.json`. +#[derive(Debug, Clone)] +pub struct PreparedBatch { + pub request: ScanResultsRequest, + /// Findings rendered into the request body. + pub emitted: usize, + /// Findings dropped because their anchor entity has no `source_file_path` + /// (Filigree requires `path`; emitting a synthetic one would pollute its + /// file registry). Surfaced so the skip is never silent. + pub skipped_no_path: usize, +} + +/// Build a scan-results batch from persisted findings. Findings whose anchor +/// entity has no source path are skipped and counted, not emitted. +#[must_use] +pub fn prepare_batch(rows: &[FindingForEmitRow], opts: &EmitOptions) -> PreparedBatch { + let mut findings = Vec::with_capacity(rows.len()); + let mut skipped_no_path = 0; + for row in rows { + match wire_finding(row, opts.default_path.as_deref()) { + Some(finding) => findings.push(finding), + None => skipped_no_path += 1, + } + } + let emitted = findings.len(); + PreparedBatch { + request: ScanResultsRequest { + scan_source: CLARION_SCAN_SOURCE.to_owned(), + scan_run_id: opts.scan_run_id.clone(), + mark_unseen: opts.mark_unseen, + create_observations: false, + complete_scan_run: opts.complete_scan_run, + findings, + }, + emitted, + skipped_no_path, + } +} + +/// Render one persisted finding as a Filigree-native wire finding, or `None` +/// when it has no usable `path` (Filigree rejects path-less findings with a +/// `400 VALIDATION`). +/// +/// `default_path` is the [`EmitOptions::default_path`] fallback: when the anchor +/// entity has no `source_file_path` (a synthetic, non-file entity) but a fallback +/// is supplied, the finding emits against it and is flagged +/// `metadata.clarion.synthetic_anchor=true`. A synthetic anchor never carries +/// line numbers (the placeholder path has no meaningful position). +fn wire_finding(row: &FindingForEmitRow, default_path: Option<&str>) -> Option { + let row_path = row + .source_file_path + .as_deref() + .map(str::trim) + .filter(|path| !path.is_empty()); + let (path, synthetic_anchor) = match row_path { + Some(path) => (path, false), + None => ( + default_path + .map(str::trim) + .filter(|path| !path.is_empty())?, + true, + ), + }; + let mut finding = Map::new(); + finding.insert("path".to_owned(), json!(path)); + finding.insert("rule_id".to_owned(), json!(row.rule_id)); + finding.insert("message".to_owned(), json!(row.message)); + finding.insert( + "severity".to_owned(), + json!(severity_to_wire(&row.severity)), + ); + // A synthetic-anchor finding (subsystem/project/guidance) has no real + // file position, so the placeholder path carries no line numbers. + if !synthetic_anchor { + if let Some(line_start) = row.source_line_start { + finding.insert("line_start".to_owned(), json!(line_start)); + } + if let Some(line_end) = row.source_line_end { + finding.insert("line_end".to_owned(), json!(line_end)); + } + } + finding.insert("metadata".to_owned(), wire_metadata(row, synthetic_anchor)); + Some(Value::Object(finding)) +} + +/// Nest Clarion's richer fields under `metadata` (top level) and +/// `metadata.clarion` (Clarion-owned slot), per ADR-004 + detailed-design §7. +fn wire_metadata(row: &FindingForEmitRow, synthetic_anchor: bool) -> Value { + let mut meta = Map::new(); + meta.insert("kind".to_owned(), json!(row.kind)); + if let Some(confidence) = row.confidence { + meta.insert("confidence".to_owned(), json!(confidence)); + } + if let Some(basis) = &row.confidence_basis { + meta.insert("confidence_basis".to_owned(), json!(basis)); + } + + let mut clarion = Map::new(); + clarion.insert("entity_id".to_owned(), json!(row.entity_id)); + clarion.insert( + "related_entities".to_owned(), + json_array_or_empty(&row.related_entities_json), + ); + clarion.insert( + "supports".to_owned(), + json_array_or_empty(&row.supports_json), + ); + clarion.insert( + "supported_by".to_owned(), + json_array_or_empty(&row.supported_by_json), + ); + // Lossless round-trip: the wire `severity` is the mapped value, so the + // internal vocabulary is preserved here for read-back. + clarion.insert("internal_severity".to_owned(), json!(row.severity)); + clarion.insert("internal_status".to_owned(), json!("open")); + // Flag the placeholder anchor so a consumer never mistakes the stand-in + // `path` (the project root) for the finding's real file location. + if synthetic_anchor { + clarion.insert("synthetic_anchor".to_owned(), json!(true)); + } + meta.insert("clarion".to_owned(), Value::Object(clarion)); + Value::Object(meta) +} + +/// Parse a stored JSON-array column; fall back to an empty array if the text is +/// malformed or not an array, so one bad row never derails a batch. +fn json_array_or_empty(raw: &str) -> Value { + match serde_json::from_str::(raw) { + Ok(value @ Value::Array(_)) => value, + _ => Value::Array(Vec::new()), + } +} + +/// Filigree's scan-results response. `#[serde(default)]` keeps the read +/// forward-compatible: Filigree may add fields without breaking Clarion. +#[derive(Debug, Clone, Default, PartialEq, Eq, Deserialize)] +#[serde(default)] +pub struct ScanResultsResponse { + pub files_created: u64, + pub files_updated: u64, + pub findings_created: u64, + pub findings_updated: u64, + pub observations_created: u64, + pub observations_failed: u64, + pub new_finding_ids: Vec, + /// Per-finding intake warnings (e.g. coerced severity, unknown + /// `scan_run_id`). REQ-FINDING-03 requires the emitter to parse these, not + /// just count them. + pub warnings: Vec, +} + +/// Parse a scan-results response body. +/// +/// # Errors +/// +/// Returns the underlying [`serde_json::Error`] if the body is not the expected +/// JSON object shape. +pub fn parse_scan_results_response(body: &str) -> Result { + serde_json::from_str(body) +} + +/// The scan-results intake URL for a Filigree base URL. +#[must_use] +pub fn scan_results_url(base_url: &str) -> String { + format!("{}/api/v1/scan-results", base_url.trim_end_matches('/')) +} + +/// The retention-sweep URL for a Filigree base URL (REQ-FINDING-06, +/// `--prune-unseen`). This is a **loom-generation** route (`/api/loom/…`), +/// unlike the classic `/api/v1/scan-results` emission intake — do not derive it +/// from [`scan_results_url`]. Verified against Filigree's own route handler and +/// API tests. +#[must_use] +pub fn clean_stale_url(base_url: &str) -> String { + format!( + "{}/api/loom/findings/clean-stale", + base_url.trim_end_matches('/') + ) +} + +/// The `POST /api/loom/findings/clean-stale` request body (REQ-FINDING-06). +/// Filigree **soft-archives** `unseen_in_latest` findings older than +/// `older_than_days`, scoped to `scan_source`, moving them to `fixed` status +/// (they auto-reopen if a later scan re-detects them — see Filigree ADR-015). +/// `scan_source` is required server-side as an accident-guard so a caller +/// cannot sweep every tool's findings; Clarion always sends `"clarion"`. +#[derive(Debug, Clone, PartialEq, Serialize)] +pub struct CleanStaleRequest { + pub scan_source: String, + pub older_than_days: u32, + pub actor: String, +} + +/// Filigree's clean-stale response. `#[serde(default)]` keeps the read tolerant +/// of added fields / missing keys so Filigree can grow the route. +#[derive(Debug, Clone, Default, PartialEq, Eq, Deserialize)] +#[serde(default)] +pub struct CleanStaleResponse { + pub findings_fixed: u64, + pub scan_source: String, + pub older_than_days: u64, +} + +/// Parse Filigree's clean-stale response body. +/// +/// # Errors +/// +/// Returns the underlying [`serde_json::Error`] if the body is not the expected +/// shape. +pub fn parse_clean_stale_response(body: &str) -> Result { + serde_json::from_str(body) +} + +#[cfg(test)] +mod tests { + use super::*; + + fn defect_row() -> FindingForEmitRow { + FindingForEmitRow { + id: "core:finding:run-1:circular".to_owned(), + rule_id: "CLA-PY-STRUCTURE-001".to_owned(), + kind: "defect".to_owned(), + severity: "WARN".to_owned(), + confidence: Some(0.95), + confidence_basis: Some("ast_match".to_owned()), + message: "Circular import detected".to_owned(), + entity_id: "python:class:auth.tokens::TokenManager".to_owned(), + related_entities_json: r#"["python:class:auth.sessions::SessionStore"]"#.to_owned(), + supports_json: "[]".to_owned(), + supported_by_json: "[]".to_owned(), + source_file_path: Some("src/auth/tokens.py".to_owned()), + source_line_start: Some(12), + source_line_end: Some(12), + } + } + + #[test] + fn severity_table_matches_detailed_design() { + assert_eq!(severity_to_wire("CRITICAL"), "critical"); + assert_eq!(severity_to_wire("ERROR"), "high"); + assert_eq!(severity_to_wire("WARN"), "medium"); + assert_eq!(severity_to_wire("INFO"), "info"); + assert_eq!(severity_to_wire("NONE"), "info"); + // Unknown values coerce to info, the same as Filigree's server-side rule. + assert_eq!(severity_to_wire("bogus"), "info"); + } + + #[test] + fn wire_finding_carries_mapped_severity_and_nested_clarion_metadata() { + let finding = wire_finding(&defect_row(), None).expect("path present"); + + assert_eq!(finding["path"], json!("src/auth/tokens.py")); + assert_eq!(finding["rule_id"], json!("CLA-PY-STRUCTURE-001")); + assert_eq!(finding["message"], json!("Circular import detected")); + // Internal WARN maps to wire medium... + assert_eq!(finding["severity"], json!("medium")); + assert_eq!(finding["line_start"], json!(12)); + assert_eq!(finding["line_end"], json!(12)); + + let meta = &finding["metadata"]; + assert_eq!(meta["kind"], json!("defect")); + assert_eq!(meta["confidence"], json!(0.95)); + assert_eq!(meta["confidence_basis"], json!("ast_match")); + + let clarion = &meta["clarion"]; + assert_eq!( + clarion["entity_id"], + json!("python:class:auth.tokens::TokenManager") + ); + assert_eq!( + clarion["related_entities"], + json!(["python:class:auth.sessions::SessionStore"]) + ); + assert_eq!(clarion["supports"], json!([])); + assert_eq!(clarion["supported_by"], json!([])); + // ...while the internal value round-trips under clarion.*. + assert_eq!(clarion["internal_severity"], json!("WARN")); + assert_eq!(clarion["internal_status"], json!("open")); + } + + #[test] + fn fact_finding_omits_confidence_basis_when_absent() { + let mut row = defect_row(); + row.kind = "fact".to_owned(); + row.severity = "NONE".to_owned(); + row.confidence = None; + row.confidence_basis = None; + + let finding = wire_finding(&row, None).expect("path present"); + assert_eq!(finding["severity"], json!("info")); + let meta = &finding["metadata"]; + assert_eq!(meta["kind"], json!("fact")); + assert!( + meta.get("confidence").is_none(), + "confidence omitted: {meta}" + ); + assert!( + meta.get("confidence_basis").is_none(), + "confidence_basis omitted: {meta}" + ); + assert_eq!(meta["clarion"]["internal_severity"], json!("NONE")); + } + + #[test] + fn path_less_finding_is_skipped_not_emitted() { + let mut row = defect_row(); + row.source_file_path = None; + assert!(wire_finding(&row, None).is_none()); + + let mut blank = defect_row(); + blank.source_file_path = Some(" ".to_owned()); + assert!( + wire_finding(&blank, None).is_none(), + "blank path is skipped too" + ); + } + + #[test] + fn path_less_finding_uses_default_path_and_flags_synthetic_anchor() { + // A subsystem-anchored finding (no source_file_path) emits against the + // supplied fallback path and is flagged as a synthetic anchor, with no + // line numbers (the placeholder path has no real position). + let mut row = defect_row(); + row.entity_id = "core:subsystem:abcd".to_owned(); + row.source_file_path = None; + let finding = wire_finding(&row, Some("/repo/root")).expect("emits via default path"); + assert_eq!(finding["path"], json!("/repo/root")); + assert_eq!( + finding["metadata"]["clarion"]["synthetic_anchor"], + json!(true) + ); + assert!( + finding.get("line_start").is_none() && finding.get("line_end").is_none(), + "synthetic anchor carries no line position: {finding}" + ); + + // A path-bearing finding ignores the fallback and is not flagged. + let finding = wire_finding(&defect_row(), Some("/repo/root")).expect("path present"); + assert_eq!(finding["path"], json!("src/auth/tokens.py")); + assert!( + finding["metadata"]["clarion"] + .get("synthetic_anchor") + .is_none(), + "real-path finding is not a synthetic anchor: {finding}" + ); + + // A blank fallback is no better than none: still skipped. + let mut row = defect_row(); + row.source_file_path = None; + assert!(wire_finding(&row, Some(" ")).is_none()); + } + + #[test] + fn malformed_related_entities_falls_back_to_empty_array() { + let mut row = defect_row(); + row.related_entities_json = "not json".to_owned(); + let finding = wire_finding(&row, None).expect("path present"); + assert_eq!( + finding["metadata"]["clarion"]["related_entities"], + json!([]) + ); + } + + #[test] + fn prepare_batch_counts_emitted_and_skipped() { + let emitted = defect_row(); + let mut skipped = defect_row(); + skipped.id = "core:finding:run-1:weak-modularity".to_owned(); + skipped.entity_id = "core:subsystem:abcd".to_owned(); + skipped.source_file_path = None; + + let batch = prepare_batch( + &[emitted, skipped], + &EmitOptions { + scan_run_id: Some("run-1".to_owned()), + mark_unseen: true, + complete_scan_run: true, + default_path: None, + }, + ); + + assert_eq!(batch.emitted, 1); + assert_eq!(batch.skipped_no_path, 1); + assert_eq!(batch.request.findings.len(), 1); + assert_eq!(batch.request.scan_source, "clarion"); + assert_eq!(batch.request.scan_run_id.as_deref(), Some("run-1")); + assert!(batch.request.mark_unseen); + assert!(batch.request.complete_scan_run); + assert!(!batch.request.create_observations); + } + + #[test] + fn request_serializes_to_filigree_wire_shape() { + let batch = prepare_batch( + &[defect_row()], + &EmitOptions { + scan_run_id: Some("run-1".to_owned()), + mark_unseen: true, + complete_scan_run: true, + default_path: None, + }, + ); + let value = serde_json::to_value(&batch.request).expect("serialize request"); + + assert_eq!(value["scan_source"], json!("clarion")); + assert_eq!(value["scan_run_id"], json!("run-1")); + assert_eq!(value["mark_unseen"], json!(true)); + assert_eq!(value["create_observations"], json!(false)); + assert_eq!(value["complete_scan_run"], json!(true)); + assert_eq!( + value["findings"].as_array().expect("findings array").len(), + 1 + ); + } + + #[test] + fn omitted_scan_run_id_is_absent_from_wire() { + let batch = prepare_batch( + &[defect_row()], + &EmitOptions { + scan_run_id: None, + mark_unseen: true, + complete_scan_run: true, + default_path: None, + }, + ); + let value = serde_json::to_value(&batch.request).expect("serialize request"); + assert!( + value.get("scan_run_id").is_none(), + "scan_run_id omitted when None: {value}" + ); + } + + #[test] + fn parses_live_response_shape() { + // Pinned to the real Filigree response captured from a live probe POST. + let response = parse_scan_results_response( + r#"{ + "files_created": 1, + "files_updated": 0, + "findings_created": 1, + "findings_updated": 0, + "new_finding_ids": ["clarion-sf-2f4cf9ca1b"], + "observations_created": 0, + "observations_failed": 0, + "warnings": ["Unknown severity 'WARN' for finding at probe/sev.py, mapped to 'info'"] + }"#, + ) + .expect("parse live response shape"); + + assert_eq!(response.findings_created, 1); + assert_eq!(response.files_created, 1); + assert_eq!(response.new_finding_ids, vec!["clarion-sf-2f4cf9ca1b"]); + assert_eq!(response.warnings.len(), 1); + assert!(response.warnings[0].contains("Unknown severity")); + } + + #[test] + fn response_parse_tolerates_missing_and_extra_fields() { + // Forward-compat: unknown fields ignored, missing fields default. + let response = parse_scan_results_response( + r#"{"findings_created": 2, "warnings": [], "some_future_field": 99}"#, + ) + .expect("parse forward-compatible response"); + assert_eq!(response.findings_created, 2); + assert!(response.warnings.is_empty()); + assert!(response.new_finding_ids.is_empty()); + } + + #[test] + fn builds_scan_results_url() { + assert_eq!( + scan_results_url("http://127.0.0.1:8542/"), + "http://127.0.0.1:8542/api/v1/scan-results" + ); + assert_eq!( + scan_results_url("http://127.0.0.1:8542"), + "http://127.0.0.1:8542/api/v1/scan-results" + ); + } + + #[test] + fn clean_stale_url_targets_the_loom_route() { + // Prune is a loom-generation route, distinct from the classic + // /api/v1 emission intake. + assert_eq!( + clean_stale_url("http://127.0.0.1:8542/"), + "http://127.0.0.1:8542/api/loom/findings/clean-stale" + ); + assert_eq!( + clean_stale_url("http://127.0.0.1:8542"), + "http://127.0.0.1:8542/api/loom/findings/clean-stale" + ); + } + + #[test] + fn clean_stale_request_serializes_to_filigree_wire_shape() { + let request = CleanStaleRequest { + scan_source: CLARION_SCAN_SOURCE.to_owned(), + older_than_days: 30, + actor: "clarion-mcp".to_owned(), + }; + let value = serde_json::to_value(&request).expect("serialize clean-stale request"); + assert_eq!(value["scan_source"], json!("clarion")); + assert_eq!(value["older_than_days"], json!(30)); + assert_eq!(value["actor"], json!("clarion-mcp")); + } + + #[test] + fn parses_clean_stale_response_shape() { + // Pinned to Filigree's clean-stale handler response. + let response = parse_clean_stale_response( + r#"{"findings_fixed": 4, "scan_source": "clarion", "older_than_days": 30}"#, + ) + .expect("parse clean-stale response"); + assert_eq!(response.findings_fixed, 4); + assert_eq!(response.scan_source, "clarion"); + assert_eq!(response.older_than_days, 30); + } + + #[test] + fn clean_stale_response_tolerates_missing_and_extra_fields() { + let response = parse_clean_stale_response(r#"{"findings_fixed": 1, "future_field": true}"#) + .expect("parse forward-compatible clean-stale response"); + assert_eq!(response.findings_fixed, 1); + assert_eq!(response.older_than_days, 0); + } +} diff --git a/crates/clarion-mcp/Cargo.toml b/crates/clarion-mcp/Cargo.toml index ab6d3ea8..b6f1e1a1 100644 --- a/crates/clarion-mcp/Cargo.toml +++ b/crates/clarion-mcp/Cargo.toml @@ -10,9 +10,11 @@ rust-version.workspace = true workspace = true [dependencies] +async-trait.workspace = true blake3.workspace = true -clarion-core = { path = "../clarion-core", version = "1.1.0" } -clarion-storage = { path = "../clarion-storage", version = "1.1.0" } +clarion-core = { path = "../clarion-core", version = "1.2.0" } +clarion-federation = { path = "../clarion-federation", version = "1.2.0" } +clarion-storage = { path = "../clarion-storage", version = "1.2.0" } reqwest.workspace = true rusqlite.workspace = true serde.workspace = true diff --git a/crates/clarion-mcp/assets/skills/clarion-workflow/SKILL.md b/crates/clarion-mcp/assets/skills/clarion-workflow/SKILL.md index dd54daad..db6925b6 100644 --- a/crates/clarion-mcp/assets/skills/clarion-workflow/SKILL.md +++ b/crates/clarion-mcp/assets/skills/clarion-workflow/SKILL.md @@ -141,18 +141,28 @@ analyze-time precompute): `find_circular_imports` and `find_coupling_hotspots` are edge-derived, so they take a `confidence` tier (default `resolved`, a ceiling) and echo it. The -categorisation shortcuts read plugin-emitted tags; the **Python plugin emits no -categorisation tags today**, so `find_entry_points`/`find_http_routes`/ -`find_data_models`/`find_tests`/`find_deprecations`/`find_todos`/`what_tests_this` -return honest-empty with a missing-signal note — an empty result means "the -signal is absent", not "there is nothing here". Likewise `high_churn` and +categorisation shortcuts read plugin-emitted tags. The Python plugin emits +conservative tags for common conventions (`entry-point`, `http-route`, `test`, +`data-model`, `cli-command`, `exported-api`), so root/tag shortcuts and +`find_dead_code` light up on freshly analyzed Python projects where those +signals are present. `find_deprecations` / `find_todos` still return +honest-empty unless a plugin emits those tags. Likewise `high_churn` and `recently_changed` are honest-empty until churn/change signals are populated (use `index_diff` for repo-level freshness). -> Not in this catalogue: `search_semantic` and `find_dead_code` (need embedding -> / whole-graph-reachability infrastructure — a separate wave), guidance -> *authoring* (`propose_guidance`/`promote_guidance` — `guidance_for` is read -> only), and `emit_observation` (no observation-write transport ships yet). +`search_semantic` is also in the catalogue. It is opt-in under +`semantic_search:`; when enabled, `clarion analyze` populates the git-ignored +`.clarion/embeddings.db` sidecar and the query path filters stale vectors by +content hash. + +> Not in this catalogue: `emit_observation` as a general-purpose write surface. + +**Guidance authoring has an operator boundary.** Operators can manage sheets via +`clarion guidance create/edit/show/list/delete/promote` (plus `export`/`import` +for team sharing). Agents may call `propose_guidance` to create a Filigree +observation, but that proposal is inert until an operator promotes it through +`promote_guidance` or the CLI. Promoted sheets reach you through `guidance_for` +and are composed into `summary` prompts with a real guidance fingerprint. ## Workflow: orient, then navigate diff --git a/crates/clarion-mcp/src/catalogue/inspection.rs b/crates/clarion-mcp/src/catalogue/inspection.rs index bc0c3f47..7c9a2311 100644 --- a/crates/clarion-mcp/src/catalogue/inspection.rs +++ b/crates/clarion-mcp/src/catalogue/inspection.rs @@ -10,14 +10,16 @@ use std::collections::HashSet; use serde_json::{Value, json}; use clarion_core::McpErrorCode; -use clarion_storage::{entity_by_id, get_taint_facts, sei_for_locator, subsystem_of_entity}; +use clarion_storage::{ + MatchFacts, RuleVerdict, entity_by_id, get_taint_facts, rule_match, sei_for_locator, +}; use crate::ParamError; use crate::ServerState; use crate::catalogue::{Page, missing_signal, paginate}; use crate::{ - entity_json, flatten_storage_envelope_result, required_str, success_envelope, - tool_error_envelope, + entity_json, flatten_storage_envelope_result, parse_to_unix_seconds, required_str, + success_envelope, tool_error_envelope, }; /// Bound on guidance sheets scanned per `guidance_for` call. Guidance is @@ -25,9 +27,6 @@ use crate::{ /// pathological project. const GUIDANCE_SCAN_CAP: usize = 2000; -/// Bound on findings scanned per `findings_for` call before in-memory filtering. -const FINDINGS_SCAN_CAP: usize = 5000; - /// Default / max page size for `findings_for`. const FINDINGS_PAGE_DEFAULT: usize = 50; const FINDINGS_PAGE_MAX: usize = 200; @@ -63,7 +62,7 @@ impl ServerState { )); }; - let facts = EntityFacts::load(conn, &entity, &project_root)?; + let facts = MatchFacts::from_entity_row(conn, &entity, &project_root)?; let guides_targets = guides_edge_sources(conn, &entity.id)?; let mut wardline_group_skipped = false; @@ -92,8 +91,16 @@ impl ServerState { scanned += 1; let sheet = GuidanceRow::from_row(row)?; - // Expiry: lexical ISO-8601 compare against the server clock. - if sheet.expires.as_deref().is_some_and(|exp| exp < now.as_str()) { + // Expiry: parse both `expires` and the server clock to Unix + // seconds (accepting `unix:` and RFC3339) and compare + // numerically. Skip only when both parse and the sheet's + // expiry precedes `now`. Fail open: a missing or unparseable + // `expires` (or an unparseable clock) never hides a sheet. + if let Some(exp) = sheet.expires.as_deref() + && let Some(exp_secs) = parse_to_unix_seconds(exp) + && let Some(now_secs) = parse_to_unix_seconds(&now) + && exp_secs < now_secs + { continue; } @@ -189,35 +196,59 @@ impl ServerState { )); }; + let kind = filter.kind.as_deref(); + let severity = filter.severity.as_deref(); + let status = filter.status.as_deref(); + let total: usize = conn.query_row( + "SELECT COUNT(*) \ + FROM findings \ + WHERE entity_id = ?1 \ + AND (?2 IS NULL OR kind = ?2) \ + AND (?3 IS NULL OR severity = ?3) \ + AND (?4 IS NULL OR status = ?4)", + rusqlite::params![entity.id, kind, severity, status], + |row| { + let count: i64 = row.get(0)?; + Ok(usize::try_from(count).unwrap_or(usize::MAX)) + }, + )?; let mut stmt = conn.prepare( "SELECT id, tool, rule_id, kind, severity, status, message, \ related_entities, confidence, created_at \ - FROM findings WHERE entity_id = ?1 \ - ORDER BY created_at DESC, id LIMIT ?2", + FROM findings \ + WHERE entity_id = ?1 \ + AND (?2 IS NULL OR kind = ?2) \ + AND (?3 IS NULL OR severity = ?3) \ + AND (?4 IS NULL OR status = ?4) \ + ORDER BY created_at DESC, id \ + LIMIT ?5 OFFSET ?6", )?; - let cap = i64::try_from(FINDINGS_SCAN_CAP).unwrap_or(i64::MAX); - let mut rows = stmt.query(rusqlite::params![entity.id, cap])?; - let mut all: Vec = Vec::new(); - let mut scan_truncated = false; + let limit = i64::try_from(page.limit).unwrap_or(i64::MAX); + let offset = i64::try_from(page.offset).unwrap_or(i64::MAX); + let mut rows = stmt.query(rusqlite::params![ + entity.id, kind, severity, status, limit, offset + ])?; + let mut page_rows: Vec = Vec::new(); while let Some(row) = rows.next()? { - if all.len() >= FINDINGS_SCAN_CAP { - scan_truncated = true; - break; - } - all.push(FindingRow::from_row(row)?); + page_rows.push(FindingRow::from_row(row)?); } - let filtered: Vec = - all.into_iter().filter(|f| filter.matches(f)).collect(); - let (slice, meta) = paginate(&filtered, page); - let findings: Vec = slice.iter().map(FindingRow::to_json).collect(); + let returned = page_rows.len(); + let findings: Vec = page_rows.iter().map(FindingRow::to_json).collect(); + let meta = json!({ + "total": total, + "offset": page.offset, + "limit": page.limit, + "returned": returned, + "truncated": page.offset.saturating_add(returned) < total, + }); Ok(success_envelope(json!({ "entity": entity_json(conn, &entity), "findings": findings, "filter": filter.to_json(), "page": meta, - "scan_truncated": scan_truncated, + "scan_truncated": false, }))) }) .await; @@ -280,49 +311,6 @@ impl ServerState { } } -/// Entity facts a guidance `match_rules` evaluation needs. -struct EntityFacts { - kind: String, - rel_path: Option, - tags: HashSet, - subsystem_id: Option, - entity_id: String, -} - -impl EntityFacts { - fn load( - conn: &rusqlite::Connection, - entity: &clarion_storage::EntityRow, - project_root: &std::path::Path, - ) -> clarion_storage::Result { - let rel_path = entity.source_file_path.as_ref().map(|path| { - std::path::Path::new(path) - .strip_prefix(project_root) - .ok() - .and_then(|rel| rel.to_str()) - .unwrap_or(path) - .to_owned() - }); - - let mut tags = HashSet::new(); - let mut stmt = conn.prepare("SELECT tag FROM entity_tags WHERE entity_id = ?1")?; - let mut rows = stmt.query(rusqlite::params![entity.id])?; - while let Some(row) = rows.next()? { - tags.insert(row.get::<_, String>(0)?); - } - - let subsystem_id = subsystem_of_entity(conn, &entity.id)?.map(|found| found.subsystem_id); - - Ok(Self { - kind: entity.kind.clone(), - rel_path, - tags, - subsystem_id, - entity_id: entity.id.clone(), - }) - } -} - /// guidance sheet ids that explicitly `guides` the given entity. fn guides_edge_sources( conn: &rusqlite::Connection, @@ -400,53 +388,6 @@ impl ComposedSheet { } } -/// The verdict of evaluating one guidance match-rule against an entity. -enum RuleVerdict { - Matched(&'static str), - NoMatch, - /// The rule cannot be evaluated at this surface (e.g. `wardline_group`, - /// which would require parsing the opaque Wardline blob). - Unevaluable, -} - -fn rule_match(rule: &Value, facts: &EntityFacts) -> RuleVerdict { - let Some(rule_type) = rule.get("type").and_then(Value::as_str) else { - return RuleVerdict::NoMatch; - }; - match rule_type { - "path" => match ( - rule.get("pattern").and_then(Value::as_str), - facts.rel_path.as_deref(), - ) { - (Some(pattern), Some(path)) if super::glob_match(pattern, path) => { - RuleVerdict::Matched("path") - } - _ => RuleVerdict::NoMatch, - }, - "tag" => match rule.get("value").and_then(Value::as_str) { - Some(value) if facts.tags.contains(value) => RuleVerdict::Matched("tag"), - _ => RuleVerdict::NoMatch, - }, - "kind" => match rule.get("value").and_then(Value::as_str) { - Some(value) if value == facts.kind => RuleVerdict::Matched("kind"), - _ => RuleVerdict::NoMatch, - }, - "subsystem" => match ( - rule.get("id").and_then(Value::as_str), - facts.subsystem_id.as_deref(), - ) { - (Some(id), Some(sub)) if id == sub => RuleVerdict::Matched("subsystem"), - _ => RuleVerdict::NoMatch, - }, - "entity" => match rule.get("id").and_then(Value::as_str) { - Some(id) if id == facts.entity_id => RuleVerdict::Matched("entity"), - _ => RuleVerdict::NoMatch, - }, - "wardline_group" => RuleVerdict::Unevaluable, - _ => RuleVerdict::NoMatch, - } -} - /// Optional `findings_for` filter (`kind` / `severity` / `status`). struct FindingFilter { kind: Option, @@ -480,15 +421,6 @@ impl FindingFilter { }) } - fn matches(&self, finding: &FindingRow) -> bool { - self.kind.as_ref().is_none_or(|k| *k == finding.kind) - && self - .severity - .as_ref() - .is_none_or(|s| *s == finding.severity) - && self.status.as_ref().is_none_or(|s| *s == finding.status) - } - fn to_json(&self) -> Value { json!({ "kind": self.kind, diff --git a/crates/clarion-mcp/src/catalogue/mod.rs b/crates/clarion-mcp/src/catalogue/mod.rs index ea0debd9..5405e97a 100644 --- a/crates/clarion-mcp/src/catalogue/mod.rs +++ b/crates/clarion-mcp/src/catalogue/mod.rs @@ -140,50 +140,11 @@ pub(crate) fn missing_signal(signal: &str, reason: &str) -> Value { }) } -/// Glob-match `path` against a `**`/`*`/`?` `pattern`, treating `/` as the -/// path separator. `**` matches zero or more whole segments; `*` matches any -/// run of non-`/` characters within a single segment; `?` matches one such -/// character. Used by `scope` path-globs and by guidance `path` match-rules. -pub(crate) fn glob_match(pattern: &str, path: &str) -> bool { - let pat: Vec<&str> = pattern.split('/').collect(); - let seg: Vec<&str> = path.split('/').collect(); - glob_segments(&pat, &seg) -} - -fn glob_segments(pat: &[&str], seg: &[&str]) -> bool { - match pat.first() { - None => seg.is_empty(), - Some(&"**") => { - // `**` consumes zero or more whole segments; try each split point. - (0..=seg.len()).any(|i| glob_segments(&pat[1..], &seg[i..])) - } - Some(head) => match seg.first() { - Some(name) if segment_match(head.as_bytes(), name.as_bytes()) => { - glob_segments(&pat[1..], &seg[1..]) - } - _ => false, - }, - } -} - -/// Within-segment wildcard match: `*` matches any run, `?` matches one char. -fn segment_match(pat: &[u8], name: &[u8]) -> bool { - match pat.first() { - None => name.is_empty(), - Some(b'*') => { - // `*` matches zero or more chars within the segment. - (0..=name.len()).any(|i| segment_match(&pat[1..], &name[i..])) - } - Some(b'?') => match name.first() { - Some(_) => segment_match(&pat[1..], &name[1..]), - None => false, - }, - Some(&head) => match name.first() { - Some(&c) if c == head => segment_match(&pat[1..], &name[1..]), - _ => false, - }, - } -} +/// Glob-match `path` against a `**`/`*`/`?` `pattern`. Re-exported from +/// `clarion-storage` so the read (`scope` / guidance `match_rules`) and write +/// (CLI guidance `--for-entity`) surfaces share one matcher — see +/// `clarion_storage::glob`. +pub(crate) use clarion_storage::glob_match; /// Bound on entity ids materialised when resolving an entity-descendant scope. const SCOPE_DESCENDANT_CAP: usize = 50_000; diff --git a/crates/clarion-mcp/src/catalogue/semantic.rs b/crates/clarion-mcp/src/catalogue/semantic.rs index cf2bd0e8..af7ead47 100644 --- a/crates/clarion-mcp/src/catalogue/semantic.rs +++ b/crates/clarion-mcp/src/catalogue/semantic.rs @@ -62,36 +62,28 @@ impl ServerState { let model_id = state.provider.model_id().to_owned(); let provider = state.provider.clone(); - // Embed the query off the async runtime (the provider call is blocking). + // Embed the query using the async provider. let embed_query = query.clone(); - let query_vector = - match tokio::task::spawn_blocking(move || provider.embed(&[embed_query])).await { - Ok(Ok(mut vectors)) => match vectors.pop() { - Some(vector) => vector, - None => { - return Ok(tool_error_envelope( - McpErrorCode::LlmProviderError, - "embedding provider returned no vector for the query", - true, - )); - } - }, - Ok(Err(err)) => { - let retryable = err.retryable(); + let query_vector = match provider.embed(&[embed_query]).await { + Ok(mut vectors) => match vectors.pop() { + Some(vector) => vector, + None => { return Ok(tool_error_envelope( McpErrorCode::LlmProviderError, - &format!("query embedding failed: {err}"), - retryable, - )); - } - Err(err) => { - return Ok(tool_error_envelope( - McpErrorCode::Internal, - &format!("embedding task failed: {err}"), + "embedding provider returned no vector for the query", true, )); } - }; + }, + Err(err) => { + let retryable = err.retryable(); + return Ok(tool_error_envelope( + McpErrorCode::LlmProviderError, + &format!("query embedding failed: {err}"), + retryable, + )); + } + }; let project_root = self.project_root.clone(); let sidecar_path = embeddings_db_path(&project_root); diff --git a/crates/clarion-mcp/src/catalogue/shortcuts.rs b/crates/clarion-mcp/src/catalogue/shortcuts.rs index ecf90aff..1339b768 100644 --- a/crates/clarion-mcp/src/catalogue/shortcuts.rs +++ b/crates/clarion-mcp/src/catalogue/shortcuts.rs @@ -6,11 +6,11 @@ //! and `find_coupling_hotspots`. No analyze-time precompute (ADR-030): each is a //! cheap read over `edges`. Edge-derived, so results declare a confidence tier //! (ADR-028), default `>= resolved`. -//! - **Honest-empty categorisation/churn shortcuts** (Task 4) — added alongside, -//! each reading an existing signal (categorisation tag / git churn) and returning -//! an honest empty result with a missing-signal note where the signal is absent. +//! - **Categorisation/churn shortcuts** (Task 4) — added alongside, each reading +//! an existing signal (categorisation tag / git churn) and returning an honest +//! empty result with a missing-signal note where the signal is absent. -use std::collections::{HashMap, HashSet}; +use std::collections::{BTreeSet, HashMap, HashSet}; use serde_json::{Value, json}; @@ -29,11 +29,13 @@ use crate::{ const EDGE_SCAN_CAP: usize = 500_000; /// Scan bound on entities materialised for the dead-code candidate set. const ENTITY_SCAN_CAP: usize = 500_000; +const EDGE_SCAN_ORDER_BY: &str = "ORDER BY kind, from_id, to_id, confidence, \ + COALESCE(source_byte_start, -1), COALESCE(source_byte_end, -1)"; /// Categorisation tags whose union is the reachability root set for -/// `find_dead_code` — entities "called from outside" the codebase. None are -/// emitted by the active plugins today (the tag-emission pipeline is tracked -/// follow-up work), so the empty-root guard fires in practice. +/// `find_dead_code` — entities "called from outside" the codebase. Tag-emitting +/// plugins populate these; the empty-root guard protects indexes with no root +/// tags from a flood of false positives. const DEAD_CODE_ROOT_TAGS: &[&str] = &[ "entry-point", "http-route", @@ -54,6 +56,15 @@ const DEAD_CODE_BARRIER_TAGS: &[&str] = &["dynamic-dispatch", "reflection"]; /// are invisible to static analysis. const DEAD_CODE_EXCLUDED_TAGS: &[&str] = &["framework-handler", "plugin-hook"]; +/// Runtime import predicate used by graph shortcuts. Missing or malformed +/// properties fail toward inclusion; explicit `type_only=true` or +/// `scope="function"` marks an import as non-module-runtime evidence. +const RUNTIME_IMPORT_EDGE_SQL: &str = "\ + (properties IS NULL \ + OR json_valid(properties) = 0 \ + OR (COALESCE(json_extract(properties, '$.type_only'), 0) != 1 \ + AND COALESCE(json_extract(properties, '$.scope'), 'module') = 'module'))"; + /// Rule id for an emitted dead-code candidate (ADR-017 `CLA-FACT-*` namespace). const DEAD_CODE_RULE_ID: &str = "CLA-FACT-DEAD-CODE-CANDIDATE"; /// Heuristic confidence for a dead-code candidate — never presented as certain. @@ -102,31 +113,20 @@ impl ServerState { let (in_scope, scope_truncated) = filter.in_scope_ids(conn, &project_root)?; // Build the import adjacency, restricted to in-scope endpoints. - let in_clause = confidence_in_clause(confidence); - let sql = format!( - "SELECT from_id, to_id FROM edges \ - WHERE kind = 'imports' AND confidence IN ({in_clause}) LIMIT ?1" - ); - let cap = i64::try_from(EDGE_SCAN_CAP.saturating_add(1)).unwrap_or(i64::MAX); - let mut stmt = conn.prepare(&sql)?; - let mut rows = stmt.query(rusqlite::params![cap])?; - let mut adjacency: HashMap> = HashMap::new(); - let mut edge_count = 0usize; - let mut scan_truncated = false; - while let Some(row) = rows.next()? { - if edge_count >= EDGE_SCAN_CAP { - scan_truncated = true; - break; - } - edge_count += 1; - let from: String = row.get(0)?; - let to: String = row.get(1)?; - let keep = in_scope - .as_ref() - .is_none_or(|ids| ids.contains(&from) && ids.contains(&to)); - if keep { - adjacency.entry(from).or_default().push(to); - } + let (mut adjacency, scan_truncated) = + import_adjacency_for_cycles(conn, confidence, EDGE_SCAN_CAP)?; + if let Some(in_scope) = &in_scope { + adjacency = adjacency + .into_iter() + .filter_map(|(from, tos)| { + if !in_scope.contains(&from) { + return None; + } + let tos: Vec = + tos.into_iter().filter(|to| in_scope.contains(to)).collect(); + (!tos.is_empty()).then_some((from, tos)) + }) + .collect(); } let cycles = strongly_connected_cycles(&adjacency); @@ -196,7 +196,8 @@ impl ServerState { // emits_finding) all carry confidence='resolved', so including // them would make the ranking dominated by containment / // membership fan-out, not actionable coupling. - let kinds = "kind IN ('calls', 'imports')"; + let kinds = + format!("(kind = 'calls' OR (kind = 'imports' AND {RUNTIME_IMPORT_EDGE_SQL}))"); // out-degree (distinct callees / targets) let out_sql = format!( "SELECT from_id, COUNT(DISTINCT to_id) FROM edges \ @@ -300,7 +301,7 @@ impl ServerState { let filter = scope.resolve(conn)?; let (in_scope, scope_truncated) = filter.in_scope_ids(conn, &project_root)?; - // Roots = "called from outside" categorisations. Absent today. + // Roots = "called from outside" categorisations. let roots = ids_with_any_tag(conn, DEAD_CODE_ROOT_TAGS)?; if roots.is_empty() { return Ok(success_envelope(json!({ @@ -313,10 +314,9 @@ impl ServerState { "scan_truncated": false, "signal": missing_signal( "entity_tags", - "reachability roots are not emitted by the active plugins \ - (entry-point / http-route / test / data-model / cli-command / \ - exported-api categorisation tags are absent), so dead code cannot be \ - determined — this is NOT a guarantee there is no dead code", + "this index has no reachability root tags (entry-point / http-route / \ + test / data-model / cli-command / exported-api), so dead code cannot \ + be determined — this is NOT a guarantee there is no dead code", ), }))); } @@ -402,8 +402,8 @@ impl ServerState { self.categorisation_shortcut( arguments, "entry-point", - "no entity is tagged as an entry point; entry-point categorisation is not emitted by \ - the active plugins (honest-empty, not a guaranteed absence of entry points)", + "no entity is tagged as an entry point in this index (honest-empty, not a guaranteed \ + absence of entry points)", ) .await } @@ -417,8 +417,7 @@ impl ServerState { self.categorisation_shortcut( arguments, "http-route", - "no entity is tagged as an HTTP route; route categorisation is not emitted by the \ - active plugins", + "no entity is tagged as an HTTP route in this index", ) .await } @@ -432,8 +431,7 @@ impl ServerState { self.categorisation_shortcut( arguments, "data-model", - "no entity is tagged as a data model; data-model categorisation is not emitted by the \ - active plugins", + "no entity is tagged as a data model in this index", ) .await } @@ -447,8 +445,7 @@ impl ServerState { self.categorisation_shortcut( arguments, "test", - "no entity is tagged as a test; test categorisation is not emitted by the active \ - plugins", + "no entity is tagged as a test in this index", ) .await } @@ -462,8 +459,7 @@ impl ServerState { self.categorisation_shortcut( arguments, "deprecated", - "no entity is tagged as deprecated; deprecation categorisation is not emitted by the \ - active plugins", + "no entity is tagged as deprecated in this index", ) .await } @@ -477,8 +473,7 @@ impl ServerState { self.categorisation_shortcut( arguments, "todo", - "no entity is tagged with a TODO/FIXME marker; TODO extraction is not emitted by the \ - active plugins", + "no entity is tagged with a TODO/FIXME marker in this index", ) .await } @@ -562,8 +557,8 @@ impl ServerState { "signal".to_owned(), missing_signal( "entity_tags", - "no test-tagged caller found; test categorisation is not emitted by \ - the active plugins, so this is not a guarantee the entity is untested", + "no test-tagged caller found in this index, so this is not a guarantee \ + the entity is untested", ), ); } @@ -738,15 +733,31 @@ fn all_entity_ids(conn: &rusqlite::Connection) -> clarion_storage::Result<(Vec clarion_storage::Result<(HashMap>, bool)> { - let cap = i64::try_from(EDGE_SCAN_CAP.saturating_add(1)).unwrap_or(i64::MAX); - let mut stmt = conn - .prepare("SELECT from_id, to_id FROM edges WHERE kind IN ('calls', 'imports') LIMIT ?1")?; + call_import_adjacency_with_cap(conn, EDGE_SCAN_CAP) +} + +fn import_adjacency_for_cycles( + conn: &rusqlite::Connection, + confidence: EdgeConfidence, + scan_cap: usize, +) -> clarion_storage::Result<(HashMap>, bool)> { + let in_clause = confidence_in_clause(confidence); + let sql = format!( + "SELECT from_id, to_id FROM edges \ + WHERE kind = 'imports' \ + AND confidence IN ({in_clause}) \ + AND {RUNTIME_IMPORT_EDGE_SQL} \ + {EDGE_SCAN_ORDER_BY} \ + LIMIT ?1" + ); + let cap = i64::try_from(scan_cap.saturating_add(1)).unwrap_or(i64::MAX); + let mut stmt = conn.prepare(&sql)?; let mut rows = stmt.query(rusqlite::params![cap])?; let mut adjacency: HashMap> = HashMap::new(); let mut edge_count = 0usize; let mut truncated = false; while let Some(row) = rows.next()? { - if edge_count >= EDGE_SCAN_CAP { + if edge_count >= scan_cap { truncated = true; break; } @@ -758,6 +769,67 @@ fn call_import_adjacency( Ok((adjacency, truncated)) } +fn call_import_adjacency_with_cap( + conn: &rusqlite::Connection, + scan_cap: usize, +) -> clarion_storage::Result<(HashMap>, bool)> { + let cap = i64::try_from(scan_cap.saturating_add(1)).unwrap_or(i64::MAX); + let mut stmt = conn.prepare( + "SELECT kind, from_id, to_id, confidence, properties \ + FROM edges \ + WHERE (kind = 'calls' OR (kind = 'imports' AND \ + (properties IS NULL \ + OR json_valid(properties) = 0 \ + OR (COALESCE(json_extract(properties, '$.type_only'), 0) != 1 \ + AND COALESCE(json_extract(properties, '$.scope'), 'module') = 'module')))) \ + ORDER BY kind, from_id, to_id, confidence, \ + COALESCE(source_byte_start, -1), COALESCE(source_byte_end, -1) \ + LIMIT ?1", + )?; + let mut rows = stmt.query(rusqlite::params![cap])?; + let mut adjacency: HashMap> = HashMap::new(); + let mut edge_count = 0usize; + let mut truncated = false; + while let Some(row) = rows.next()? { + if edge_count >= scan_cap { + truncated = true; + break; + } + edge_count += 1; + let kind: String = row.get(0)?; + let from: String = row.get(1)?; + let to: String = row.get(2)?; + let confidence: String = row.get(3)?; + let properties: Option = row.get(4)?; + let targets = reachability_targets(&kind, &to, &confidence, properties.as_deref()); + adjacency.entry(from).or_default().extend(targets); + } + Ok((adjacency, truncated)) +} + +fn reachability_targets( + kind: &str, + to_id: &str, + confidence: &str, + properties_json: Option<&str>, +) -> Vec { + let mut targets = BTreeSet::from([to_id.to_owned()]); + if kind == "calls" && confidence == "ambiguous" { + targets.extend(candidate_ids(properties_json)); + } + targets.into_iter().collect() +} + +fn candidate_ids(properties_json: Option<&str>) -> BTreeSet { + properties_json + .and_then(|raw| serde_json::from_str::(raw).ok()) + .and_then(|value| value.get("candidates").and_then(|c| c.as_array()).cloned()) + .into_iter() + .flatten() + .filter_map(|value| value.as_str().map(ToOwned::to_owned)) + .collect() +} + /// Forward-reachable closure of `seed` over `adjacency` (iterative BFS/DFS). fn forward_reachable( adjacency: &HashMap>, @@ -860,6 +932,48 @@ fn strongly_connected_cycles(adjacency: &HashMap>) -> Vec rusqlite::Connection { + let conn = rusqlite::Connection::open_in_memory().expect("open in-memory db"); + conn.execute_batch( + "CREATE TABLE edges ( + kind TEXT NOT NULL, + from_id TEXT NOT NULL, + to_id TEXT NOT NULL, + properties TEXT, + source_file_id TEXT, + source_byte_start INTEGER, + source_byte_end INTEGER, + confidence TEXT NOT NULL DEFAULT 'resolved' + );", + ) + .expect("create edges table"); + conn + } + + fn insert_edge( + conn: &rusqlite::Connection, + kind: &str, + from_id: &str, + to_id: &str, + properties: Option<&str>, + source_byte_start: i64, + ) { + conn.execute( + "INSERT INTO edges ( + kind, from_id, to_id, confidence, properties, source_byte_start, source_byte_end + ) VALUES (?1, ?2, ?3, 'resolved', ?4, ?5, ?6)", + rusqlite::params![ + kind, + from_id, + to_id, + properties, + source_byte_start, + source_byte_start + 1 + ], + ) + .expect("insert edge"); + } + fn graph(edges: &[(&str, &str)]) -> HashMap> { let mut adjacency: HashMap> = HashMap::new(); for (from, to) in edges { @@ -871,6 +985,86 @@ mod tests { adjacency } + #[test] + fn import_cycle_scan_truncates_in_deterministic_order() { + let conn = edge_scan_conn(); + insert_edge( + &conn, + "imports", + "python:module:z", + "python:module:a", + None, + 30, + ); + insert_edge( + &conn, + "imports", + "python:module:a", + "python:module:c", + None, + 20, + ); + insert_edge( + &conn, + "imports", + "python:module:a", + "python:module:b", + None, + 10, + ); + + let (adjacency, truncated) = + import_adjacency_for_cycles(&conn, EdgeConfidence::Resolved, 2).unwrap(); + + assert!(truncated); + assert_eq!( + adjacency.get("python:module:a").unwrap(), + &vec!["python:module:b".to_owned(), "python:module:c".to_owned()] + ); + assert!(!adjacency.contains_key("python:module:z")); + } + + #[test] + fn dead_code_edge_scan_truncates_in_deterministic_order() { + let conn = edge_scan_conn(); + insert_edge( + &conn, + "calls", + "python:function:z", + "python:function:a", + None, + 30, + ); + insert_edge( + &conn, + "calls", + "python:function:a", + "python:function:c", + None, + 20, + ); + insert_edge( + &conn, + "calls", + "python:function:a", + "python:function:b", + None, + 10, + ); + + let (adjacency, truncated) = call_import_adjacency_with_cap(&conn, 2).unwrap(); + + assert!(truncated); + assert_eq!( + adjacency.get("python:function:a").unwrap(), + &vec![ + "python:function:b".to_owned(), + "python:function:c".to_owned() + ] + ); + assert!(!adjacency.contains_key("python:function:z")); + } + #[test] fn detects_a_two_node_cycle() { let g = graph(&[("a", "b"), ("b", "a"), ("b", "c")]); diff --git a/crates/clarion-mcp/src/config.rs b/crates/clarion-mcp/src/config.rs index 5377d159..1d7d4549 100644 --- a/crates/clarion-mcp/src/config.rs +++ b/crates/clarion-mcp/src/config.rs @@ -1,1075 +1 @@ -use std::path::Path; -use std::{fs, net::SocketAddr}; - -use serde::Deserialize; -use thiserror::Error; - -#[derive(Debug, Clone, PartialEq, Deserialize, Default)] -#[serde(default)] -pub struct McpConfig { - #[serde(alias = "llm_policy")] - pub llm: LlmConfig, - pub semantic_search: SemanticSearchConfig, - pub integrations: IntegrationsConfig, - pub serve: ServeConfig, -} - -impl McpConfig { - pub fn from_path(path: &Path) -> Result { - let raw = fs::read_to_string(path).map_err(|source| ConfigError::Io { - path: path.display().to_string(), - source, - })?; - Self::from_yaml_str(&raw) - } - - pub fn from_yaml_str(raw: &str) -> Result { - if raw.trim().is_empty() { - return Ok(Self::default()); - } - reject_llm_policy_alias_collision(raw)?; - let config: Self = - serde_norway::from_str(raw).map_err(|err| ConfigError::Yaml(err.to_string()))?; - config.validate()?; - Ok(config) - } - - fn validate(&self) -> Result<(), ConfigError> { - if self.llm.provider == LlmProviderKind::Anthropic - || self.llm.anthropic_api_key_env.is_some() - { - return Err(ConfigError::DeprecatedProvider { - code: "CLA-CONFIG-DEPRECATED-PROVIDER", - }); - } - if self.integrations.filigree.enabled && self.integrations.filigree.actor.trim().is_empty() - { - return Err(ConfigError::InvalidFiligreeActor { - code: "CLA-CONFIG-FILIGREE-ACTOR-BLANK", - }); - } - self.serve.http.validate_loopback_trust()?; - Ok(()) - } -} - -#[derive(Debug, Clone, PartialEq, Deserialize)] -#[serde(default)] -pub struct LlmConfig { - pub enabled: bool, - pub provider: LlmProviderKind, - pub allow_live_provider: bool, - pub session_token_ceiling: u64, - pub model_id: String, - pub openrouter: OpenRouterConfig, - pub codex_cli: CodexCliConfig, - pub claude_cli: ClaudeCliConfig, - pub recording_fixture_path: Option, - pub max_inferred_edges_per_caller: u32, - pub cache_max_age_days: u32, - pub anthropic_api_key_env: Option, -} - -impl Default for LlmConfig { - fn default() -> Self { - Self { - enabled: false, - provider: LlmProviderKind::OpenRouter, - allow_live_provider: false, - session_token_ceiling: 1_000_000, - model_id: "anthropic/claude-sonnet-4.6".to_owned(), - openrouter: OpenRouterConfig::default(), - codex_cli: CodexCliConfig::default(), - claude_cli: ClaudeCliConfig::default(), - recording_fixture_path: None, - max_inferred_edges_per_caller: 8, - cache_max_age_days: 180, - anthropic_api_key_env: None, - } - } -} - -/// Semantic-search (embeddings) policy for `search_semantic` (`WS5b` / ADR-040). -/// **Opt-in, off by default** — mirrors [`LlmConfig`]; Loom is local-first, so -/// nothing here makes a hosted embedding service required. When `enabled` is -/// false the `search_semantic` tool degrades honestly to "not enabled". -#[derive(Debug, Clone, PartialEq, Deserialize)] -#[serde(default)] -pub struct SemanticSearchConfig { - pub enabled: bool, - /// Explicit opt-in to the live API provider (in addition to `enabled`). - pub allow_live_provider: bool, - /// Embedding model id; embeddings are cache-keyed by this. - pub model_id: String, - /// Vector dimensionality (must match the model). - pub dimensions: usize, - /// `OpenAI`-compatible base URL (`/embeddings` is appended). - pub endpoint_url: String, - /// Env var holding the API key for the live provider. - pub api_key_env: String, - pub timeout_seconds: u64, - /// Per-session embedding token ceiling for cost governance. - pub session_token_ceiling: u64, -} - -impl Default for SemanticSearchConfig { - fn default() -> Self { - Self { - enabled: false, - allow_live_provider: false, - model_id: "text-embedding-3-small".to_owned(), - dimensions: 1536, - endpoint_url: "https://api.openai.com/v1".to_owned(), - api_key_env: "OPENAI_API_KEY".to_owned(), - timeout_seconds: 60, - session_token_ceiling: 5_000_000, - } - } -} - -#[derive(Debug, Clone, Copy, PartialEq, Eq, Deserialize)] -#[serde(rename_all = "snake_case")] -pub enum LlmProviderKind { - #[serde(rename = "openrouter", alias = "open_router")] - OpenRouter, - #[serde(rename = "codex_cli", alias = "codex")] - CodexCli, - #[serde(rename = "claude_cli", alias = "claude_code")] - ClaudeCli, - Anthropic, - Recording, -} - -#[derive(Debug, Clone, PartialEq, Deserialize)] -#[serde(default)] -pub struct OpenRouterConfig { - pub endpoint_url: String, - pub api_key_env: String, - pub attribution: OpenRouterAttributionConfig, - pub timeout_seconds: u64, -} - -impl Default for OpenRouterConfig { - fn default() -> Self { - Self { - endpoint_url: "https://openrouter.ai/api/v1".to_owned(), - api_key_env: "OPENROUTER_API_KEY".to_owned(), - attribution: OpenRouterAttributionConfig::default(), - timeout_seconds: 300, - } - } -} - -#[derive(Debug, Clone, PartialEq, Deserialize)] -#[serde(default)] -pub struct OpenRouterAttributionConfig { - pub referer: String, - pub title: String, -} - -impl Default for OpenRouterAttributionConfig { - fn default() -> Self { - Self { - referer: "https://github.com/tachyon-beep/clarion".to_owned(), - title: "Clarion".to_owned(), - } - } -} - -#[derive(Debug, Clone, PartialEq, Deserialize)] -#[serde(default)] -pub struct CodexCliConfig { - pub executable: String, - pub model: Option, - pub profile: Option, - pub sandbox: CodexSandboxMode, - pub timeout_seconds: u64, -} - -impl Default for CodexCliConfig { - fn default() -> Self { - Self { - executable: "codex".to_owned(), - model: None, - profile: None, - sandbox: CodexSandboxMode::ReadOnly, - timeout_seconds: 300, - } - } -} - -#[derive(Debug, Clone, Copy, PartialEq, Eq, Deserialize)] -#[serde(rename_all = "kebab-case")] -pub enum CodexSandboxMode { - ReadOnly, - WorkspaceWrite, - DangerFullAccess, -} - -impl CodexSandboxMode { - #[must_use] - pub fn as_str(self) -> &'static str { - match self { - Self::ReadOnly => "read-only", - Self::WorkspaceWrite => "workspace-write", - Self::DangerFullAccess => "danger-full-access", - } - } -} - -#[derive(Debug, Clone, PartialEq, Deserialize)] -#[serde(default)] -pub struct ClaudeCliConfig { - pub executable: String, - pub model: Option, - pub permission_mode: ClaudePermissionMode, - pub tools: Vec, - pub timeout_seconds: u64, - pub max_turns: u32, - pub no_session_persistence: bool, - pub exclude_dynamic_system_prompt_sections: bool, -} - -impl Default for ClaudeCliConfig { - fn default() -> Self { - Self { - executable: "claude".to_owned(), - model: None, - permission_mode: ClaudePermissionMode::Plan, - tools: Vec::new(), - timeout_seconds: 300, - max_turns: 2, - no_session_persistence: true, - exclude_dynamic_system_prompt_sections: true, - } - } -} - -#[derive(Debug, Clone, Copy, PartialEq, Eq, Deserialize)] -pub enum ClaudePermissionMode { - #[serde(rename = "plan")] - Plan, - #[serde(rename = "default")] - Default, - #[serde(rename = "acceptEdits")] - AcceptEdits, - #[serde(rename = "bypassPermissions")] - BypassPermissions, -} - -impl ClaudePermissionMode { - #[must_use] - pub fn as_str(self) -> &'static str { - match self { - Self::Plan => "plan", - Self::Default => "default", - Self::AcceptEdits => "acceptEdits", - Self::BypassPermissions => "bypassPermissions", - } - } -} - -#[derive(Debug, Clone, PartialEq, Default, Deserialize)] -#[serde(default)] -pub struct IntegrationsConfig { - pub filigree: FiligreeConfig, -} - -#[derive(Debug, Clone, PartialEq, Default, Deserialize)] -#[serde(default)] -pub struct ServeConfig { - pub http: HttpReadConfig, -} - -#[derive(Debug, Clone, PartialEq, Deserialize)] -#[serde(default)] -pub struct HttpReadConfig { - pub enabled: bool, - #[serde(deserialize_with = "deserialize_socket_addr")] - pub bind: SocketAddr, - pub allow_non_loopback: bool, - /// Name of the env var holding the inbound bearer token. When the env - /// var is set, every `/api/v1/files`-family request must carry - /// `Authorization: Bearer `; the capabilities probe is - /// always unauthenticated. When the env var is unset on a loopback - /// bind, the surface stays unauthenticated (the v0.1 trust model). - /// When the env var is unset on a non-loopback bind, `clarion serve` - /// refuses to start (`CLA-CONFIG-HTTP-NO-AUTH`). Default - /// `CLARION_LOOM_TOKEN` matches Filigree's pinned client default. - pub token_env: String, - /// Optional env var holding the Loom component identity HMAC secret. - /// When configured, `clarion serve` refuses to start unless the env var - /// exists and protected HTTP read routes require - /// `X-Loom-Component: clarion:`. - pub identity_token_env: Option, - /// Enable the Wardline taint-store WRITE API (POST /api/wardline/taint-facts). - /// Default false — `serve` is read-only unless explicitly opted in (ADR-036). - /// When true, `serve` spawns an optional ADR-011 writer-actor. - #[serde(default)] - pub wardline_taint_write: bool, -} - -impl Default for HttpReadConfig { - fn default() -> Self { - Self { - enabled: false, - bind: SocketAddr::from(([127, 0, 0, 1], 9111)), - allow_non_loopback: false, - token_env: "CLARION_LOOM_TOKEN".to_owned(), - identity_token_env: None, - wardline_taint_write: false, - } - } -} - -impl HttpReadConfig { - pub fn validate_loopback_trust(&self) -> Result<(), ConfigError> { - if self.enabled && !self.allow_non_loopback && !self.is_loopback_bind() { - return Err(ConfigError::NonLoopbackHttpBind { - code: "CLA-CONFIG-HTTP-NON-LOOPBACK", - bind: self.bind, - }); - } - Ok(()) - } - - /// Refuse to start a non-loopback HTTP read API when the inbound bearer - /// token env var is unset. Loopback binds with the env var unset stay - /// unauthenticated (v0.1 trust matrix); the failure case is the explicit - /// `allow_non_loopback: true` opt-in plus an unset `token_env`. - pub fn validate_auth_trust(&self, env_lookup: F) -> Result<(), ConfigError> - where - F: Fn(&str) -> Option, - { - if !self.enabled { - return Ok(()); - } - let has_identity_secret = match self.identity_token_env.as_deref() { - Some(env_var) => { - let has_secret = env_lookup(env_var) - .as_deref() - .is_some_and(|value| !value.trim().is_empty()); - if !has_secret { - return Err(ConfigError::MissingHttpIdentitySecret { - code: "CLA-CONFIG-HTTP-IDENTITY-MISSING", - token_env: env_var.to_owned(), - }); - } - true - } - None => false, - }; - if self.is_loopback_bind() { - return Ok(()); - } - if has_identity_secret { - return Ok(()); - } - let has_token = env_lookup(&self.token_env) - .as_deref() - .is_some_and(|value| !value.trim().is_empty()); - if has_token { - return Ok(()); - } - Err(ConfigError::NonLoopbackHttpNoAuth { - code: "CLA-CONFIG-HTTP-NO-AUTH", - bind: self.bind, - token_env: self.token_env.clone(), - }) - } - - #[must_use] - pub fn is_loopback_bind(&self) -> bool { - self.bind.ip().is_loopback() - } -} - -fn deserialize_socket_addr<'de, D>(deserializer: D) -> Result -where - D: serde::Deserializer<'de>, -{ - let raw = String::deserialize(deserializer)?; - raw.parse() - .map_err(|err| serde::de::Error::custom(format!("invalid serve.http.bind {raw:?}: {err}"))) -} - -#[derive(Debug, Clone, PartialEq, Deserialize)] -#[serde(default)] -pub struct FiligreeConfig { - pub enabled: bool, - pub base_url: String, - pub actor: String, - pub token_env: String, - pub timeout_seconds: u64, - /// Whether `clarion analyze` POSTs its findings to Filigree's - /// `POST /api/v1/scan-results` intake on completion (WP9-B, - /// REQ-FINDING-03). Emission is a one-way Clarion→Filigree data egress, so - /// it is its own explicit opt-in: it requires both `enabled` *and* this - /// flag, and **both default `false`**. Enabling the integration for the - /// read side (`issues_for` reverse-lookup) therefore does not silently - /// start outbound emission — the operator opts into the write direction - /// separately by setting `emit_findings: true`. - pub emit_findings: bool, - /// Age threshold (days) for `clarion analyze --prune-unseen` (REQ-FINDING-06): - /// findings Filigree has marked `unseen_in_latest` and that are older than - /// this are soft-archived (`fixed`) by the retention sweep. Default 30. - /// Only consulted when `--prune-unseen` is passed; the sweep itself is - /// opt-in per invocation, not on by default. - pub prune_unseen_days: u32, -} - -impl Default for FiligreeConfig { - fn default() -> Self { - Self { - enabled: false, - base_url: "http://127.0.0.1:8766".to_owned(), - actor: "clarion-mcp".to_owned(), - token_env: "FILIGREE_API_TOKEN".to_owned(), - timeout_seconds: 5, - emit_findings: false, - prune_unseen_days: 30, - } - } -} - -#[derive(Debug, Clone, PartialEq, Eq)] -pub enum ProviderSelection { - Disabled, - Recording, - OpenRouter { api_key_env: String }, - CodexCli, - ClaudeCli, -} - -pub fn select_provider_with_env( - config: &McpConfig, - env_lookup: F, -) -> Result -where - F: Fn(&str) -> Option, -{ - if !config.llm.enabled { - return Ok(ProviderSelection::Disabled); - } - - match config.llm.provider { - LlmProviderKind::Recording => Ok(ProviderSelection::Recording), - LlmProviderKind::Anthropic => Err(ConfigError::DeprecatedProvider { - code: "CLA-CONFIG-DEPRECATED-PROVIDER", - }), - LlmProviderKind::OpenRouter => { - let live_env_opt_in = env_lookup("CLARION_LLM_LIVE").as_deref() == Some("1"); - if !config.llm.allow_live_provider && !live_env_opt_in { - return Ok(ProviderSelection::Disabled); - } - - let env_var = config.llm.openrouter.api_key_env.clone(); - let has_key = env_lookup(&env_var) - .as_deref() - .is_some_and(|value| !value.trim().is_empty()); - if !has_key { - return Err(ConfigError::MissingOpenRouterApiKey { env_var }); - } - - Ok(ProviderSelection::OpenRouter { - api_key_env: env_var, - }) - } - LlmProviderKind::CodexCli => { - let live_env_opt_in = env_lookup("CLARION_LLM_LIVE").as_deref() == Some("1"); - if !config.llm.allow_live_provider && !live_env_opt_in { - return Ok(ProviderSelection::Disabled); - } - Ok(ProviderSelection::CodexCli) - } - LlmProviderKind::ClaudeCli => { - let live_env_opt_in = env_lookup("CLARION_LLM_LIVE").as_deref() == Some("1"); - if !config.llm.allow_live_provider && !live_env_opt_in { - return Ok(ProviderSelection::Disabled); - } - Ok(ProviderSelection::ClaudeCli) - } - } -} - -#[derive(Debug, Error)] -pub enum ConfigError { - #[error("read MCP config {path}: {source}")] - Io { - path: String, - #[source] - source: std::io::Error, - }, - - #[error("invalid MCP config: {0}")] - Yaml(String), - - #[error("live OpenRouter provider selected but API key env var {env_var} is missing")] - MissingOpenRouterApiKey { env_var: String }, - - #[error( - "{code}: llm.provider=anthropic is deprecated; use llm_policy.provider: openrouter with llm_policy.openrouter.api_key_env and llm_policy.model_id" - )] - DeprecatedProvider { code: &'static str }, - - #[error("{code}: integrations.filigree.actor must not be blank when Filigree is enabled")] - InvalidFiligreeActor { code: &'static str }, - - #[error( - "{code}: serve.http.bind {bind} exposes the unauthenticated non-loopback Clarion HTTP read API; \ - bind to loopback (127.0.0.1 or ::1) or set serve.http.allow_non_loopback: true only on a trusted network" - )] - NonLoopbackHttpBind { - code: &'static str, - bind: SocketAddr, - }, - - #[error( - "{code}: serve.http.bind {bind} is non-loopback and serve.http.allow_non_loopback is true, \ - but the inbound auth env var ${token_env} is unset; refusing to start an unauthenticated \ - HTTP read API on a routable interface. Set ${token_env} to a non-empty bearer token, \ - or bind to loopback." - )] - NonLoopbackHttpNoAuth { - code: &'static str, - bind: SocketAddr, - token_env: String, - }, - - #[error( - "{code}: serve.http.identity_token_env names ${token_env}, but that env var is unset; \ - refusing to start an HTTP read API with incomplete Loom component identity configuration." - )] - MissingHttpIdentitySecret { - code: &'static str, - token_env: String, - }, - - #[error( - "{code}: clarion.yaml contains both `llm` and `llm_policy` top-level keys; \ - `llm_policy` is a serde alias for `llm` and serde silently discards one. \ - Pick one and remove the other." - )] - AmbiguousLlmKey { code: &'static str }, -} - -/// Reject configs that name both `llm` and `llm_policy` at the top level. -/// They alias the same field; serde-norway silently picks one and discards -/// the other, which is the classic copy-paste-migration pitfall. Detecting -/// the collision pre-parse turns a silent override into a typed error. -fn reject_llm_policy_alias_collision(raw: &str) -> Result<(), ConfigError> { - let value: serde_norway::Value = match serde_norway::from_str(raw) { - Ok(value) => value, - // If the YAML doesn't even parse as a generic Value, let the typed - // parse below produce the canonical Yaml error. - Err(_) => return Ok(()), - }; - let Some(mapping) = value.as_mapping() else { - return Ok(()); - }; - let has_llm = mapping.contains_key("llm"); - let has_llm_policy = mapping.contains_key("llm_policy"); - if has_llm && has_llm_policy { - return Err(ConfigError::AmbiguousLlmKey { - code: "CLA-CONFIG-AMBIGUOUS-LLM-KEY", - }); - } - Ok(()) -} - -#[cfg(test)] -mod tests { - use super::*; - - #[test] - fn parses_mcp_llm_and_filigree_config() { - let cfg = McpConfig::from_yaml_str( - r#" -llm: - enabled: true - provider: openrouter - session_token_ceiling: 250000 - model_id: anthropic/claude-sonnet-4.6 - openrouter: - endpoint_url: http://localhost:4000/api/v1 - api_key_env: TEST_OPENROUTER_KEY - attribution: - referer: https://example.invalid/clarion - title: Clarion Test - max_inferred_edges_per_caller: 3 - cache_max_age_days: 7 -integrations: - filigree: - enabled: true - base_url: "http://127.0.0.1:9999" - actor: "clarion-test" - token_env: TEST_FILIGREE_TOKEN - timeout_seconds: 2 -"#, - ) - .expect("parse config"); - - assert!(cfg.llm.enabled); - assert_eq!(cfg.llm.provider, LlmProviderKind::OpenRouter); - assert_eq!(cfg.llm.session_token_ceiling, 250_000); - assert_eq!(cfg.llm.model_id, "anthropic/claude-sonnet-4.6"); - assert_eq!( - cfg.llm.openrouter.endpoint_url, - "http://localhost:4000/api/v1" - ); - assert_eq!(cfg.llm.openrouter.api_key_env, "TEST_OPENROUTER_KEY"); - assert_eq!( - cfg.llm.openrouter.attribution.referer, - "https://example.invalid/clarion" - ); - assert_eq!(cfg.llm.openrouter.attribution.title, "Clarion Test"); - assert_eq!(cfg.llm.openrouter.timeout_seconds, 300); // default — not set in YAML - assert_eq!(cfg.llm.max_inferred_edges_per_caller, 3); - assert_eq!(cfg.llm.cache_max_age_days, 7); - assert!(cfg.integrations.filigree.enabled); - assert_eq!(cfg.integrations.filigree.base_url, "http://127.0.0.1:9999"); - assert_eq!(cfg.integrations.filigree.actor, "clarion-test"); - assert_eq!(cfg.integrations.filigree.token_env, "TEST_FILIGREE_TOKEN"); - assert_eq!(cfg.integrations.filigree.timeout_seconds, 2); - } - - #[test] - fn filigree_emission_is_opt_in_independent_of_enabled() { - // clarion-a26de2f368: outbound finding emission is a one-way egress and - // must not piggyback on enabling Filigree for read enrichment. Both - // knobs default false so flipping `enabled` for `issues_for` never - // silently starts POSTing findings. - let defaults = FiligreeConfig::default(); - assert!(!defaults.enabled); - assert!( - !defaults.emit_findings, - "emit_findings must default false (explicit write opt-in)" - ); - - // Turning on the read side alone leaves emission off. - let read_only = McpConfig::from_yaml_str( - r" -integrations: - filigree: - enabled: true -", - ) - .expect("parse config"); - assert!(read_only.integrations.filigree.enabled); - assert!( - !read_only.integrations.filigree.emit_findings, - "enabling Filigree for reads must not turn on outbound emission" - ); - } - - #[test] - fn accepts_llm_policy_alias_for_operator_config() { - let cfg = McpConfig::from_yaml_str( - r" -llm_policy: - enabled: true - provider: openrouter - model_id: openai/gpt-4o-mini -", - ) - .expect("parse config"); - - assert!(cfg.llm.enabled); - assert_eq!(cfg.llm.provider, LlmProviderKind::OpenRouter); - assert_eq!(cfg.llm.model_id, "openai/gpt-4o-mini"); - } - - #[test] - fn rejects_both_llm_and_llm_policy_keys_present_together() { - // Realistic migration-doc copy-paste case: operator copies the new - // `llm_policy:` block but forgets to delete the old `llm:` block. - // Serde-norway would silently pick one and discard the other. - let err = McpConfig::from_yaml_str( - r" -llm: - enabled: false - provider: recording -llm_policy: - enabled: true - provider: openrouter - model_id: openai/gpt-4o-mini -", - ) - .expect_err("ambiguous llm key must be rejected"); - - match err { - ConfigError::AmbiguousLlmKey { code } => { - assert_eq!(code, "CLA-CONFIG-AMBIGUOUS-LLM-KEY"); - } - other => panic!("expected AmbiguousLlmKey error, got: {other:?}"), - } - } - - #[test] - fn api_key_alone_does_not_select_live_provider() { - let cfg = McpConfig { - llm: LlmConfig { - enabled: true, - provider: LlmProviderKind::OpenRouter, - ..LlmConfig::default() - }, - semantic_search: SemanticSearchConfig::default(), - integrations: IntegrationsConfig::default(), - serve: ServeConfig::default(), - }; - - let selected = select_provider_with_env(&cfg, |name| { - (name == "OPENROUTER_API_KEY").then(|| "secret".to_owned()) - }) - .expect("provider selection"); - - assert_eq!(selected, ProviderSelection::Disabled); - } - - #[test] - fn live_provider_requires_config_or_env_opt_in_and_api_key() { - let cfg = McpConfig { - llm: LlmConfig { - enabled: true, - provider: LlmProviderKind::OpenRouter, - allow_live_provider: true, - ..LlmConfig::default() - }, - semantic_search: SemanticSearchConfig::default(), - integrations: IntegrationsConfig::default(), - serve: ServeConfig::default(), - }; - - let missing = select_provider_with_env(&cfg, |_| None).expect_err("missing key"); - assert!(matches!( - missing, - ConfigError::MissingOpenRouterApiKey { ref env_var } - if env_var == "OPENROUTER_API_KEY" - )); - - let selected = select_provider_with_env(&cfg, |name| { - (name == "OPENROUTER_API_KEY").then(|| "secret".to_owned()) - }) - .expect("provider selection"); - assert_eq!( - selected, - ProviderSelection::OpenRouter { - api_key_env: "OPENROUTER_API_KEY".to_owned() - } - ); - } - - #[test] - fn codex_cli_provider_requires_live_opt_in_but_no_api_key() { - let cfg = McpConfig::from_yaml_str( - r" -llm_policy: - enabled: true - provider: codex_cli - allow_live_provider: true - model_id: codex-cli-default - codex_cli: - executable: /tmp/fake-codex - model: gpt-5.5 - profile: clarion - sandbox: read-only - timeout_seconds: 30 -", - ) - .expect("parse Codex CLI provider config"); - - assert_eq!(cfg.llm.provider, LlmProviderKind::CodexCli); - assert_eq!(cfg.llm.model_id, "codex-cli-default"); - assert_eq!(cfg.llm.codex_cli.executable, "/tmp/fake-codex"); - assert_eq!(cfg.llm.codex_cli.model.as_deref(), Some("gpt-5.5")); - assert_eq!(cfg.llm.codex_cli.profile.as_deref(), Some("clarion")); - assert_eq!(cfg.llm.codex_cli.sandbox, CodexSandboxMode::ReadOnly); - assert_eq!(cfg.llm.codex_cli.timeout_seconds, 30); - - let selected = select_provider_with_env(&cfg, |_| None).expect("provider selection"); - assert_eq!(selected, ProviderSelection::CodexCli); - } - - #[test] - fn codex_cli_provider_stays_disabled_without_live_opt_in() { - let cfg = McpConfig { - llm: LlmConfig { - enabled: true, - provider: LlmProviderKind::CodexCli, - ..LlmConfig::default() - }, - semantic_search: SemanticSearchConfig::default(), - integrations: IntegrationsConfig::default(), - serve: ServeConfig::default(), - }; - - let selected = select_provider_with_env(&cfg, |_| None).expect("provider selection"); - assert_eq!(selected, ProviderSelection::Disabled); - - let env_selected = select_provider_with_env(&cfg, |name| { - (name == "CLARION_LLM_LIVE").then(|| "1".to_owned()) - }) - .expect("provider selection via env opt-in"); - assert_eq!(env_selected, ProviderSelection::CodexCli); - } - - #[test] - fn claude_cli_provider_requires_live_opt_in_but_no_api_key() { - let cfg = McpConfig::from_yaml_str( - r#" -llm_policy: - enabled: true - provider: claude_cli - allow_live_provider: true - model_id: claude-code-default - claude_cli: - executable: /tmp/fake-claude - model: claude-sonnet-4-6 - permission_mode: plan - tools: ["Read", "Glob", "Grep"] - timeout_seconds: 45 - max_turns: 2 - no_session_persistence: true -"#, - ) - .expect("parse Claude CLI provider config"); - - assert_eq!(cfg.llm.provider, LlmProviderKind::ClaudeCli); - assert_eq!(cfg.llm.model_id, "claude-code-default"); - assert_eq!(cfg.llm.claude_cli.executable, "/tmp/fake-claude"); - assert_eq!( - cfg.llm.claude_cli.model.as_deref(), - Some("claude-sonnet-4-6") - ); - assert_eq!( - cfg.llm.claude_cli.permission_mode, - ClaudePermissionMode::Plan - ); - assert_eq!(cfg.llm.claude_cli.tools, vec!["Read", "Glob", "Grep"]); - assert_eq!(cfg.llm.claude_cli.timeout_seconds, 45); - assert_eq!(cfg.llm.claude_cli.max_turns, 2); - assert!(cfg.llm.claude_cli.no_session_persistence); - - let selected = select_provider_with_env(&cfg, |_| None).expect("provider selection"); - assert_eq!(selected, ProviderSelection::ClaudeCli); - } - - #[test] - fn claude_cli_provider_stays_disabled_without_live_opt_in() { - let cfg = McpConfig { - llm: LlmConfig { - enabled: true, - provider: LlmProviderKind::ClaudeCli, - ..LlmConfig::default() - }, - semantic_search: SemanticSearchConfig::default(), - integrations: IntegrationsConfig::default(), - serve: ServeConfig::default(), - }; - - let selected = select_provider_with_env(&cfg, |_| None).expect("provider selection"); - assert_eq!(selected, ProviderSelection::Disabled); - - let env_selected = select_provider_with_env(&cfg, |name| { - (name == "CLARION_LLM_LIVE").then(|| "1".to_owned()) - }) - .expect("provider selection via env opt-in"); - assert_eq!(env_selected, ProviderSelection::ClaudeCli); - } - - #[test] - fn http_bind_is_parsed_when_config_loads() { - let cfg = McpConfig::from_yaml_str( - r#" -serve: - http: - enabled: true - bind: "127.0.0.1:0" -"#, - ) - .expect("parse HTTP bind"); - - assert_eq!(cfg.serve.http.bind, SocketAddr::from(([127, 0, 0, 1], 0))); - } - - #[test] - fn http_allow_non_loopback_defaults_false() { - assert!(!McpConfig::default().serve.http.allow_non_loopback); - } - - #[test] - fn http_allow_non_loopback_is_parsed_when_config_loads() { - let cfg = McpConfig::from_yaml_str( - r#" -serve: - http: - enabled: true - bind: "127.0.0.1:0" - allow_non_loopback: true -"#, - ) - .expect("parse HTTP allow_non_loopback"); - - assert!(cfg.serve.http.allow_non_loopback); - } - - #[test] - fn http_identity_token_env_is_parsed_when_config_loads() { - let cfg = McpConfig::from_yaml_str( - r#" -serve: - http: - enabled: true - bind: "127.0.0.1:0" - identity_token_env: CLARION_TEST_IDENTITY -"#, - ) - .expect("parse HTTP identity_token_env"); - - assert_eq!( - cfg.serve.http.identity_token_env.as_deref(), - Some("CLARION_TEST_IDENTITY") - ); - } - - #[test] - fn http_wardline_taint_write_defaults_false() { - assert!(!McpConfig::default().serve.http.wardline_taint_write); - } - - #[test] - fn http_wardline_taint_write_is_parsed_when_config_loads() { - let cfg = McpConfig::from_yaml_str( - r#" -serve: - http: - enabled: true - bind: "127.0.0.1:0" - wardline_taint_write: true -"#, - ) - .expect("parse HTTP wardline_taint_write"); - - assert!(cfg.serve.http.wardline_taint_write); - } - - #[test] - fn enabled_non_loopback_http_bind_requires_allow_non_loopback() { - let err = McpConfig::from_yaml_str( - r#" -serve: - http: - enabled: true - bind: "0.0.0.0:0" -"#, - ) - .expect_err("enabled wildcard HTTP bind should require explicit opt-in"); - - let message = err.to_string(); - assert!( - message.contains("unauthenticated non-loopback"), - "error should explain the unauthenticated non-loopback risk: {message}" - ); - assert!( - message.contains("allow_non_loopback"), - "error should name the explicit opt-in: {message}" - ); - } - - #[test] - fn enabled_lan_http_bind_requires_allow_non_loopback() { - let err = McpConfig::from_yaml_str( - r#" -serve: - http: - enabled: true - bind: "192.168.1.10:0" -"#, - ) - .expect_err("enabled LAN HTTP bind should require explicit opt-in"); - - assert!(matches!(err, ConfigError::NonLoopbackHttpBind { .. })); - } - - #[test] - fn enabled_ipv6_loopback_http_bind_is_allowed_by_default() { - let cfg = McpConfig::from_yaml_str( - r#" -serve: - http: - enabled: true - bind: "[::1]:0" -"#, - ) - .expect("IPv6 loopback HTTP bind should not require non-loopback opt-in"); - - assert!(!cfg.serve.http.allow_non_loopback); - assert!(cfg.serve.http.is_loopback_bind()); - } - - #[test] - fn enabled_non_loopback_http_bind_allows_explicit_opt_in() { - let cfg = McpConfig::from_yaml_str( - r#" -serve: - http: - enabled: true - bind: "0.0.0.0:0" - allow_non_loopback: true -"#, - ) - .expect("explicit opt-in should allow non-loopback HTTP bind"); - - assert!(cfg.serve.http.allow_non_loopback); - } - - #[test] - fn invalid_http_bind_fails_config_load() { - let err = McpConfig::from_yaml_str( - r#" -serve: - http: - enabled: true - bind: "not-a-socket" -"#, - ) - .expect_err("invalid bind should fail"); - - assert!( - err.to_string().contains("invalid serve.http.bind"), - "unexpected error: {err}" - ); - } - - #[test] - fn old_anthropic_provider_shape_reports_deprecated_provider() { - let err = McpConfig::from_yaml_str( - r" -llm: - enabled: true - provider: anthropic - anthropic_api_key_env: ANTHROPIC_API_KEY -", - ) - .expect_err("old provider shape should be rejected"); - - assert!(matches!(err, ConfigError::DeprecatedProvider { .. })); - assert!(err.to_string().contains("CLA-CONFIG-DEPRECATED-PROVIDER")); - assert!(err.to_string().contains("provider: openrouter")); - } - - #[test] - fn enabled_filigree_integration_rejects_blank_actor() { - let err = McpConfig::from_yaml_str( - r#" -integrations: - filigree: - enabled: true - actor: " " -"#, - ) - .expect_err("blank Filigree actor should be rejected"); - - assert!(err.to_string().contains("CLA-CONFIG-FILIGREE-ACTOR-BLANK")); - } -} +pub use clarion_federation::config::*; diff --git a/crates/clarion-mcp/src/filigree.rs b/crates/clarion-mcp/src/filigree.rs index b011ae5a..234d9309 100644 --- a/crates/clarion-mcp/src/filigree.rs +++ b/crates/clarion-mcp/src/filigree.rs @@ -1,1018 +1 @@ -//! Filigree HTTP contract helpers for Clarion MCP. - -use std::time::Duration; - -use serde::{Deserialize, Serialize}; -use thiserror::Error; - -use crate::config::FiligreeConfig; -use crate::scan_results::{ - CleanStaleRequest, CleanStaleResponse, ScanResultsRequest, ScanResultsResponse, - clean_stale_url, parse_clean_stale_response, parse_scan_results_response, scan_results_url, -}; - -#[derive(Debug, Clone, PartialEq, Eq, Deserialize)] -pub struct EntityAssociationsResponse { - pub associations: Vec, -} - -/// The subset of a Filigree issue Clarion surfaces alongside an -/// entity-association match: enough to render the match without an agent -/// having to call back into Filigree. Sourced from `GET /api/loom/issues/{id}`. -/// Unknown fields in the response are ignored, so Filigree can grow the route -/// without breaking this read. -#[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize)] -pub struct IssueDetail { - pub title: String, - pub status: String, - pub priority: i64, -} - -#[derive(Debug, Clone, PartialEq, Eq, Deserialize)] -pub struct EntityAssociation { - pub issue_id: String, - pub clarion_entity_id: String, - pub content_hash_at_attach: String, - pub attached_at: String, - pub attached_by: String, -} - -/// One Wardline finding as Clarion surfaces it — the subset of Filigree's -/// `ScanFindingLoom` (`GET /api/loom/findings`) used for read-time -/// reconciliation. Unknown fields are ignored so Filigree can grow the row. -#[derive(Debug, Clone, PartialEq, Deserialize, Serialize)] -pub struct WardlineFinding { - pub rule_id: String, - pub message: String, - #[serde(default)] - pub severity: Option, - #[serde(default)] - pub status: Option, - #[serde(default)] - pub line_start: Option, - #[serde(default)] - pub line_end: Option, - #[serde(default)] - pub fingerprint: Option, - #[serde(default)] - pub file_id: Option, - /// The finding's `metadata` object; `metadata.wardline.qualname` is the - /// reconciliation key. Defaults to JSON null when absent. - #[serde(default)] - pub metadata: serde_json::Value, -} - -/// Envelope returned by `GET /api/loom/findings` — the paged list of -/// [`WardlineFinding`] rows Clarion reconciles against. -#[derive(Debug, Clone, PartialEq, Deserialize)] -pub struct WardlineFindingsResponse { - #[serde(default)] - pub items: Vec, - /// True when more findings pages follow. Clarion does not page the findings - /// list (the offset param is unpinned in the federation contract); when this - /// is true the first page is an incomplete view, so the caller fails closed - /// to `unavailable` rather than silently undercounting the file's findings. - #[serde(default)] - pub has_more: bool, -} - -/// One row of `GET /api/loom/files` — only the fields needed to map a path to -/// Filigree's `file_id`. -#[derive(Debug, Clone, PartialEq, Deserialize)] -pub struct LoomFileRecord { - pub file_id: String, - pub path: String, -} - -/// Envelope returned by `GET /api/loom/files` — the paged list of -/// [`LoomFileRecord`] rows Clarion uses to map a path to a `file_id`. -#[derive(Debug, Clone, PartialEq, Deserialize)] -pub struct LoomFilesResponse { - #[serde(default)] - pub items: Vec, - /// True when more pages follow. When the exact-path match is absent and - /// `has_more` is true, the result is indeterminate — the file may be on a - /// later page — so callers must degrade to `unavailable` rather than - /// concluding `no_matches`. - #[serde(default)] - pub has_more: bool, -} - -pub fn parse_wardline_findings_response( - body: &str, -) -> Result { - serde_json::from_str(body).map_err(FiligreeContractError::from) -} - -pub fn parse_loom_files_response(body: &str) -> Result { - serde_json::from_str(body).map_err(FiligreeContractError::from) -} - -#[derive(Debug, Error)] -pub enum FiligreeContractError { - #[error("invalid Filigree response: {0}")] - InvalidResponse(#[from] serde_json::Error), -} - -#[derive(Debug, Error)] -pub enum FiligreeClientError { - #[error("build Filigree HTTP client: {0}")] - Build(#[source] reqwest::Error), - - #[error("request Filigree entity associations: {0}")] - Request(#[source] reqwest::Error), - - #[error("Filigree returned HTTP {status}: {body}")] - HttpStatus { status: u16, body: String }, - - #[error("POST Filigree scan-results: {0}")] - ScanResultsRequest(#[source] reqwest::Error), - - #[error("invalid Filigree scan-results response: {0}")] - InvalidScanResultsResponse(#[source] serde_json::Error), - - #[error("POST Filigree clean-stale: {0}")] - CleanStaleRequest(#[source] reqwest::Error), - - #[error("invalid Filigree clean-stale response: {0}")] - InvalidCleanStaleResponse(#[source] serde_json::Error), - - #[error(transparent)] - Contract(#[from] FiligreeContractError), -} - -pub trait FiligreeLookup: Send + Sync { - fn associations_for( - &self, - entity_id: &str, - ) -> Result; - - /// Fetch an issue's title/status/priority to enrich an association match. - /// Returns `Ok(None)` when the issue (or the detail route itself) is - /// unavailable — a `404` — so callers degrade to issue-id-only rather than - /// failing the whole `issues_for` call, per the enrich-only federation - /// axiom. The default reports the route as unavailable; the HTTP client - /// overrides it. A transport / non-404 HTTP failure is surfaced as `Err` - /// so the caller can stop hammering a down endpoint. - fn issue_detail(&self, _issue_id: &str) -> Result, FiligreeClientError> { - Ok(None) - } - - /// Wardline findings for a source file, for read-time reconciliation - /// (Flow B). Two-hop: resolve `path` -> Filigree `file_id`, then fetch that - /// file's `scan_source=wardline` findings. Returns an empty list when no - /// Wardline-touched file exists at `path`. Default impl returns empty (no - /// Filigree); the HTTP client overrides it. Transport / non-success HTTP is - /// surfaced as `Err` so the caller degrades the section to `unavailable`. - fn wardline_findings_for_path( - &self, - _path: &str, - ) -> Result, FiligreeClientError> { - Ok(Vec::new()) - } -} - -#[derive(Debug, Clone)] -pub struct FiligreeHttpClient { - base_url: String, - actor: String, - token: Option, - client: reqwest::blocking::Client, -} - -impl FiligreeHttpClient { - pub fn from_config( - config: &FiligreeConfig, - env_lookup: F, - ) -> Result, FiligreeClientError> - where - F: Fn(&str) -> Option, - { - if !config.enabled { - return Ok(None); - } - let client = reqwest::blocking::Client::builder() - .timeout(Duration::from_secs(config.timeout_seconds.max(1))) - .build() - .map_err(FiligreeClientError::Build)?; - let token = env_lookup(&config.token_env).filter(|value| !value.trim().is_empty()); - Ok(Some(Self { - base_url: config.base_url.clone(), - actor: config.actor.clone(), - token, - client, - })) - } - - /// POST a scan-results batch to Filigree's native intake (WP9-B, - /// REQ-FINDING-03). One-way Clarion→Filigree push; the caller is expected to - /// inspect [`ScanResultsResponse::warnings`] (severity coercion, unknown - /// `scan_run_id`, etc.) rather than just the counts. - /// - /// # Errors - /// - /// Returns [`FiligreeClientError::ScanResultsRequest`] on transport failure, - /// [`FiligreeClientError::HttpStatus`] on a non-success response (e.g. a - /// `400 VALIDATION` for a malformed batch), or - /// [`FiligreeClientError::InvalidScanResultsResponse`] when the body is not - /// the expected shape. - pub fn post_scan_results( - &self, - request: &ScanResultsRequest, - ) -> Result { - let mut http_request = self - .client - .post(scan_results_url(&self.base_url)) - .header("accept", "application/json") - .json(request); - if !self.actor.trim().is_empty() { - http_request = http_request.header("x-filigree-actor", self.actor.as_str()); - } - if let Some(token) = &self.token { - http_request = http_request.bearer_auth(token); - } - let response = http_request - .send() - .map_err(FiligreeClientError::ScanResultsRequest)?; - let status = response.status(); - let body = response - .text() - .map_err(FiligreeClientError::ScanResultsRequest)?; - if !status.is_success() { - return Err(FiligreeClientError::HttpStatus { - status: status.as_u16(), - body, - }); - } - parse_scan_results_response(&body).map_err(FiligreeClientError::InvalidScanResultsResponse) - } - - /// POST a retention sweep to Filigree's `clean-stale` route (REQ-FINDING-06, - /// `--prune-unseen`). One-way Clarion→Filigree call; Filigree soft-archives - /// its own `unseen_in_latest` findings for the given `scan_source`. The - /// `scan_source` scoping is enforced server-side, so this can only sweep - /// Clarion's findings. - /// - /// # Errors - /// - /// Returns [`FiligreeClientError::CleanStaleRequest`] on transport failure, - /// [`FiligreeClientError::HttpStatus`] on a non-success response, or - /// [`FiligreeClientError::InvalidCleanStaleResponse`] when the body is not - /// the expected shape. - pub fn post_clean_stale( - &self, - request: &CleanStaleRequest, - ) -> Result { - let mut http_request = self - .client - .post(clean_stale_url(&self.base_url)) - .header("accept", "application/json") - .json(request); - if !self.actor.trim().is_empty() { - http_request = http_request.header("x-filigree-actor", self.actor.as_str()); - } - if let Some(token) = &self.token { - http_request = http_request.bearer_auth(token); - } - let response = http_request - .send() - .map_err(FiligreeClientError::CleanStaleRequest)?; - let status = response.status(); - let body = response - .text() - .map_err(FiligreeClientError::CleanStaleRequest)?; - if !status.is_success() { - return Err(FiligreeClientError::HttpStatus { - status: status.as_u16(), - body, - }); - } - parse_clean_stale_response(&body).map_err(FiligreeClientError::InvalidCleanStaleResponse) - } - - /// GET `url` with the standard actor + bearer headers, returning the raw - /// (unread) response. Shared by [`get_json`](Self::get_json) and - /// [`get_json_or_none`](Self::get_json_or_none); the latter inspects the - /// status before reading the body so a `404` can short-circuit. - fn send_get(&self, url: &str) -> Result { - let mut request = self.client.get(url).header("accept", "application/json"); - if !self.actor.trim().is_empty() { - request = request.header("x-filigree-actor", self.actor.as_str()); - } - if let Some(token) = &self.token { - request = request.bearer_auth(token); - } - request.send().map_err(FiligreeClientError::Request) - } - - /// GET `url` with the standard actor + bearer headers and parse the body as - /// `T`. A non-success status is surfaced as `HttpStatus` so the caller can - /// stop hammering a down endpoint. - fn get_json( - &self, - url: &str, - ) -> Result { - let response = self.send_get(url)?; - let status = response.status(); - let body = response.text().map_err(FiligreeClientError::Request)?; - if !status.is_success() { - return Err(FiligreeClientError::HttpStatus { - status: status.as_u16(), - body, - }); - } - serde_json::from_str(&body) - .map_err(|e| FiligreeClientError::Contract(FiligreeContractError::from(e))) - } - - /// Like [`get_json`](Self::get_json) but maps a `404` to `Ok(None)` — the - /// enrich-only degrade signal for "the resource (or the route itself) is - /// absent", not an error. The body is not read on a `404`. Any other - /// non-success status is still surfaced as `HttpStatus`. - fn get_json_or_none( - &self, - url: &str, - ) -> Result, FiligreeClientError> { - let response = self.send_get(url)?; - let status = response.status(); - if status == reqwest::StatusCode::NOT_FOUND { - return Ok(None); - } - let body = response.text().map_err(FiligreeClientError::Request)?; - if !status.is_success() { - return Err(FiligreeClientError::HttpStatus { - status: status.as_u16(), - body, - }); - } - serde_json::from_str(&body) - .map(Some) - .map_err(|e| FiligreeClientError::Contract(FiligreeContractError::from(e))) - } -} - -impl FiligreeLookup for FiligreeHttpClient { - fn associations_for( - &self, - entity_id: &str, - ) -> Result { - self.get_json(&entity_associations_url(&self.base_url, entity_id)) - } - - fn issue_detail(&self, issue_id: &str) -> Result, FiligreeClientError> { - // A 404 means the issue (or the whole detail route) is absent — the - // enrich-only degrade signal, not an error — so use the `_or_none` form. - self.get_json_or_none(&issue_detail_url(&self.base_url, issue_id)) - } - - fn wardline_findings_for_path( - &self, - path: &str, - ) -> Result, FiligreeClientError> { - // Hop 1: path -> Filigree file_id. path_prefix is a prefix filter, so - // take only the row whose path is byte-exact. - let files: LoomFilesResponse = - self.get_json(&loom_files_url(&self.base_url, "wardline", path))?; - let exact = files.items.into_iter().find(|f| f.path == path); - let Some(file_id) = exact.map(|f| f.file_id) else { - // No exact match on this page. If has_more is true the result is - // indeterminate — the file may be on a later page — so degrade to - // unavailable rather than falsely concluding no_matches. - if files.has_more { - return Err(FiligreeClientError::HttpStatus { - status: 0, - body: - "loom/files truncated before exact path match; cannot conclude no findings" - .to_owned(), - }); - } - return Ok(Vec::new()); - }; - // Hop 2: file_id -> wardline findings. As with hop-1, Clarion reads only - // the first page; if it is truncated (`has_more`) the findings view is - // incomplete, so fail closed to `unavailable` rather than returning a - // silent undercount. - let findings: WardlineFindingsResponse = - self.get_json(&loom_findings_url(&self.base_url, "wardline", &file_id))?; - if findings.has_more { - return Err(FiligreeClientError::HttpStatus { - status: 0, - body: "loom/findings truncated; cannot enumerate all findings for file".to_owned(), - }); - } - Ok(findings.items) - } -} - -pub fn parse_entity_associations_response( - body: &str, -) -> Result { - serde_json::from_str(body).map_err(FiligreeContractError::from) -} - -pub fn parse_issue_detail_response(body: &str) -> Result { - serde_json::from_str(body).map_err(FiligreeContractError::from) -} - -pub fn issue_detail_url(base_url: &str, issue_id: &str) -> String { - format!( - "{}/api/loom/issues/{}", - base_url.trim_end_matches('/'), - percent_encode_query_value(issue_id) - ) -} - -pub fn entity_associations_url(base_url: &str, entity_id: &str) -> String { - format!( - "{}/api/entity-associations?entity_id={}", - base_url.trim_end_matches('/'), - percent_encode_query_value(entity_id) - ) -} - -pub fn loom_files_url(base_url: &str, scan_source: &str, path_prefix: &str) -> String { - format!( - "{}/api/loom/files?scan_source={}&path_prefix={}", - base_url.trim_end_matches('/'), - percent_encode_query_value(scan_source), - percent_encode_query_value(path_prefix) - ) -} - -pub fn loom_findings_url(base_url: &str, scan_source: &str, file_id: &str) -> String { - format!( - "{}/api/loom/findings?scan_source={}&file_id={}", - base_url.trim_end_matches('/'), - percent_encode_query_value(scan_source), - percent_encode_query_value(file_id) - ) -} - -fn percent_encode_query_value(raw: &str) -> String { - let mut encoded = String::new(); - for byte in raw.bytes() { - match byte { - b'A'..=b'Z' | b'a'..=b'z' | b'0'..=b'9' | b'-' | b'.' | b'_' | b'~' => { - encoded.push(char::from(byte)); - } - _ => { - encoded.push('%'); - encoded.push(hex_digit(byte >> 4)); - encoded.push(hex_digit(byte & 0x0f)); - } - } - } - encoded -} - -fn hex_digit(value: u8) -> char { - match value { - 0..=9 => char::from(b'0' + value), - 10..=15 => char::from(b'A' + (value - 10)), - _ => unreachable!("nibble is always <= 15"), - } -} - -#[cfg(test)] -mod tests { - use super::*; - use std::io::{Read, Write}; - use std::net::TcpListener; - - #[test] - fn parses_reverse_entity_association_response_shape() { - let parsed = parse_entity_associations_response( - r#"{ - "associations": [ - { - "issue_id": "filigree-1234567890", - "clarion_entity_id": "python:function:demo.hello", - "content_hash_at_attach": "hash-a", - "attached_at": "2026-05-17T00:00:00.000Z", - "attached_by": "codex" - } - ] - }"#, - ) - .expect("parse Filigree reverse route response"); - - assert_eq!(parsed.associations.len(), 1); - let row = &parsed.associations[0]; - assert_eq!(row.issue_id, "filigree-1234567890"); - assert_eq!(row.clarion_entity_id, "python:function:demo.hello"); - assert_eq!(row.content_hash_at_attach, "hash-a"); - assert_eq!(row.attached_at, "2026-05-17T00:00:00.000Z"); - assert_eq!(row.attached_by, "codex"); - } - - #[test] - fn builds_reverse_route_url_with_encoded_entity_id() { - let url = entity_associations_url("http://127.0.0.1:8766/", "python:function:demo.hello"); - - assert_eq!( - url, - "http://127.0.0.1:8766/api/entity-associations?entity_id=python%3Afunction%3Ademo.hello" - ); - } - - #[test] - fn http_client_hits_reverse_route_with_actor_and_bearer_headers() { - let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); - let addr = listener.local_addr().expect("local addr"); - let handle = std::thread::spawn(move || { - let (mut stream, _) = listener.accept().expect("accept request"); - let mut request = [0_u8; 4096]; - let read = stream.read(&mut request).expect("read request"); - let request = String::from_utf8_lossy(&request[..read]); - assert!(request.contains( - "GET /api/entity-associations?entity_id=python%3Afunction%3Ademo.hello HTTP/1.1" - )); - assert!(request.contains("x-filigree-actor: clarion-test")); - assert!(request.contains("authorization: Bearer secret-token")); - - let body = r#"{"associations":[{"issue_id":"filigree-1234567890","clarion_entity_id":"python:function:demo.hello","content_hash_at_attach":"hash-a","attached_at":"2026-05-17T00:00:00.000Z","attached_by":"codex"}]}"#; - write!( - stream, - "HTTP/1.1 200 OK\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", - body.len(), - body - ) - .expect("write response"); - }); - let config = FiligreeConfig { - enabled: true, - base_url: format!("http://{addr}"), - actor: "clarion-test".to_owned(), - token_env: "TEST_FILIGREE_TOKEN".to_owned(), - timeout_seconds: 1, - emit_findings: true, - prune_unseen_days: 30, - }; - let client = FiligreeHttpClient::from_config(&config, |name| { - (name == "TEST_FILIGREE_TOKEN").then(|| "secret-token".to_owned()) - }) - .expect("build client") - .expect("enabled client"); - - let response = client - .associations_for("python:function:demo.hello") - .expect("fetch associations"); - - assert_eq!(response.associations[0].issue_id, "filigree-1234567890"); - handle.join().expect("server thread"); - } - - #[test] - fn parses_issue_detail_response_shape() { - let parsed = parse_issue_detail_response( - r#"{ - "issue_id": "clarion-51a2868c86", - "title": "issues_for: enrich matches", - "status": "proposed", - "status_category": "open", - "priority": 3, - "type": "feature" - }"#, - ) - .expect("parse issue detail"); - assert_eq!(parsed.title, "issues_for: enrich matches"); - assert_eq!(parsed.status, "proposed"); - assert_eq!(parsed.priority, 3); - } - - #[test] - fn builds_issue_detail_url_with_encoded_id() { - let url = issue_detail_url("http://127.0.0.1:8542/", "clarion-51a2868c86"); - assert_eq!( - url, - "http://127.0.0.1:8542/api/loom/issues/clarion-51a2868c86" - ); - } - - #[test] - fn issue_detail_http_client_parses_200() { - let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); - let addr = listener.local_addr().expect("local addr"); - let handle = std::thread::spawn(move || { - let (mut stream, _) = listener.accept().expect("accept request"); - let mut request = [0_u8; 4096]; - let read = stream.read(&mut request).expect("read request"); - let request = String::from_utf8_lossy(&request[..read]); - assert!(request.contains("GET /api/loom/issues/clarion-51a2868c86 HTTP/1.1")); - - let body = r#"{"issue_id":"clarion-51a2868c86","title":"enrich","status":"proposed","priority":3}"#; - write!( - stream, - "HTTP/1.1 200 OK\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", - body.len(), - body - ) - .expect("write response"); - }); - let client = detail_test_client(addr); - let detail = client - .issue_detail("clarion-51a2868c86") - .expect("issue detail request") - .expect("issue present"); - assert_eq!(detail.title, "enrich"); - assert_eq!(detail.status, "proposed"); - assert_eq!(detail.priority, 3); - handle.join().expect("server thread"); - } - - #[test] - fn issue_detail_http_client_maps_404_to_none() { - let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); - let addr = listener.local_addr().expect("local addr"); - let handle = std::thread::spawn(move || { - let (mut stream, _) = listener.accept().expect("accept request"); - let mut request = [0_u8; 4096]; - let _ = stream.read(&mut request).expect("read request"); - let body = r#"{"error":"Not Found","code":"NOT_FOUND"}"#; - write!( - stream, - "HTTP/1.1 404 Not Found\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", - body.len(), - body - ) - .expect("write response"); - }); - let client = detail_test_client(addr); - let detail = client - .issue_detail("clarion-missing") - .expect("404 is Ok(None), not an error"); - assert!(detail.is_none(), "404 degrades to None: {detail:?}"); - handle.join().expect("server thread"); - } - - #[test] - fn post_scan_results_sends_batch_and_parses_response() { - let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); - let addr = listener.local_addr().expect("local addr"); - let handle = std::thread::spawn(move || { - let (mut stream, _) = listener.accept().expect("accept request"); - let mut request = [0_u8; 8192]; - let read = stream.read(&mut request).expect("read request"); - let request = String::from_utf8_lossy(&request[..read]); - assert!( - request.contains("POST /api/v1/scan-results HTTP/1.1"), - "request line: {request}" - ); - assert!(request.contains("x-filigree-actor: clarion-test")); - assert!(request.contains("authorization: Bearer secret-token")); - // The wire body carries the mapped severity, not the internal one. - assert!( - request.contains("\"scan_source\":\"clarion\""), - "body: {request}" - ); - assert!( - request.contains("\"severity\":\"medium\""), - "body: {request}" - ); - assert!( - request.contains("\"internal_severity\":\"WARN\""), - "body: {request}" - ); - - let body = r#"{"files_created":1,"files_updated":0,"findings_created":1,"findings_updated":0,"new_finding_ids":["clarion-sf-abc"],"observations_created":0,"observations_failed":0,"warnings":["Scan run run-1 status not updated to 'completed': not found"]}"#; - write!( - stream, - "HTTP/1.1 200 OK\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", - body.len(), - body - ) - .expect("write response"); - }); - let config = FiligreeConfig { - enabled: true, - base_url: format!("http://{addr}"), - actor: "clarion-test".to_owned(), - token_env: "TEST_FILIGREE_TOKEN".to_owned(), - timeout_seconds: 1, - emit_findings: true, - prune_unseen_days: 30, - }; - let client = FiligreeHttpClient::from_config(&config, |name| { - (name == "TEST_FILIGREE_TOKEN").then(|| "secret-token".to_owned()) - }) - .expect("build client") - .expect("enabled client"); - - let row = clarion_storage::FindingForEmitRow { - id: "core:finding:run-1:circular".to_owned(), - rule_id: "CLA-PY-STRUCTURE-001".to_owned(), - kind: "defect".to_owned(), - severity: "WARN".to_owned(), - confidence: Some(0.9), - confidence_basis: None, - message: "Circular import".to_owned(), - entity_id: "python:class:auth.tokens::TokenManager".to_owned(), - related_entities_json: "[]".to_owned(), - supports_json: "[]".to_owned(), - supported_by_json: "[]".to_owned(), - source_file_path: Some("src/auth/tokens.py".to_owned()), - source_line_start: Some(12), - source_line_end: Some(12), - }; - let batch = crate::scan_results::prepare_batch( - &[row], - &crate::scan_results::EmitOptions { - scan_run_id: Some("run-1".to_owned()), - mark_unseen: true, - complete_scan_run: true, - default_path: None, - }, - ); - - let response = client - .post_scan_results(&batch.request) - .expect("post scan results"); - assert_eq!(response.findings_created, 1); - assert_eq!(response.new_finding_ids, vec!["clarion-sf-abc"]); - assert_eq!(response.warnings.len(), 1); - handle.join().expect("server thread"); - } - - #[test] - fn post_scan_results_surfaces_validation_error_as_http_status() { - let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); - let addr = listener.local_addr().expect("local addr"); - let handle = std::thread::spawn(move || { - let (mut stream, _) = listener.accept().expect("accept request"); - let mut request = [0_u8; 8192]; - let _ = stream.read(&mut request).expect("read request"); - let body = - r#"{"error":"findings[0] is missing required key 'path'","code":"VALIDATION"}"#; - write!( - stream, - "HTTP/1.1 400 Bad Request\r\ncontent-type: application/json\r\ncontent-length: {}\r\n\r\n{}", - body.len(), - body - ) - .expect("write response"); - }); - let client = detail_test_client(addr); - let batch = crate::scan_results::prepare_batch( - &[], - &crate::scan_results::EmitOptions { - scan_run_id: None, - mark_unseen: true, - complete_scan_run: true, - default_path: None, - }, - ); - let err = client - .post_scan_results(&batch.request) - .expect_err("400 surfaces as error"); - match err { - FiligreeClientError::HttpStatus { status, .. } => assert_eq!(status, 400), - other => panic!("expected HttpStatus, got {other:?}"), - } - handle.join().expect("server thread"); - } - - #[test] - fn parses_loom_findings_list_envelope() { - let resp = parse_wardline_findings_response( - r#"{"items":[ - {"finding_id":"f-1","file_id":"file-9","severity":"high","status":"open", - "scan_source":"wardline","rule_id":"WLN-TAINT-001","message":"tainted sink", - "suggestion":"","scan_run_id":"r-1","line_start":12,"line_end":12, - "fingerprint":"fp-abc","issue_id":null,"seen_count":1, - "metadata":{"wardline":{"qualname":"demo.Foo.bar","kind":"DEFECT"}}, - "data_warnings":[]} - ],"has_more":false}"#, - ) - .expect("parse findings list"); - assert_eq!(resp.items.len(), 1); - let f = &resp.items[0]; - assert_eq!(f.rule_id, "WLN-TAINT-001"); - assert_eq!(f.fingerprint.as_deref(), Some("fp-abc")); - assert_eq!(f.line_start, Some(12)); - assert_eq!( - f.metadata - .get("wardline") - .and_then(|w| w.get("qualname")) - .and_then(|q| q.as_str()), - Some("demo.Foo.bar") - ); - } - - #[test] - fn parses_loom_files_list_envelope() { - let resp = parse_loom_files_response( - r#"{"items":[ - {"file_id":"file-9","path":"src/demo.py","language":"python","file_type":"source"}, - {"file_id":"file-10","path":"src/demo_helpers.py","language":"python","file_type":"source"} - ],"has_more":false}"#, - ) - .expect("parse files list"); - assert_eq!(resp.items.len(), 2); - assert_eq!(resp.items[0].file_id, "file-9"); - assert_eq!(resp.items[0].path, "src/demo.py"); - } - - #[test] - fn builds_loom_url_builders_with_encoding() { - assert_eq!( - loom_files_url("http://127.0.0.1:8542/", "wardline", "src/demo.py"), - "http://127.0.0.1:8542/api/loom/files?scan_source=wardline&path_prefix=src%2Fdemo.py" - ); - assert_eq!( - loom_findings_url("http://127.0.0.1:8542/", "wardline", "file-9"), - "http://127.0.0.1:8542/api/loom/findings?scan_source=wardline&file_id=file-9" - ); - } - - #[test] - fn wardline_findings_for_path_does_two_hops_and_exact_path_filter() { - let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); - let addr = listener.local_addr().expect("local addr"); - let handle = std::thread::spawn(move || { - // Hop 1: GET /api/loom/files — path_prefix matches two files; the - // exact-path filter must pick file-9, not the helpers file. - let (mut s1, _) = listener.accept().expect("accept files"); - let mut buf = [0_u8; 4096]; - let n = s1.read(&mut buf).expect("read files req"); - let req = String::from_utf8_lossy(&buf[..n]); - assert!(req.contains( - "GET /api/loom/files?scan_source=wardline&path_prefix=src%2Fdemo.py HTTP/1.1" - )); - let body = r#"{"items":[{"file_id":"file-9","path":"src/demo.py","language":"python","file_type":"source"},{"file_id":"file-10","path":"src/demo.py.bak","language":"python","file_type":"source"}],"has_more":false}"#; - // connection: close forces reqwest to open a fresh TCP connection for - // hop 2, so the listener's second accept() receives it (the blocking - // client would otherwise pool/reuse hop-1's socket and hop-2's - // accept() would hang). - write!( - s1, - "HTTP/1.1 200 OK\r\nconnection: close\r\ncontent-length: {}\r\n\r\n{}", - body.len(), - body - ) - .unwrap(); - - // Hop 2: GET /api/loom/findings for file-9. - let (mut s2, _) = listener.accept().expect("accept findings"); - let n = s2.read(&mut buf).expect("read findings req"); - let req = String::from_utf8_lossy(&buf[..n]); - assert!( - req.contains("GET /api/loom/findings?scan_source=wardline&file_id=file-9 HTTP/1.1") - ); - let body = r#"{"items":[{"finding_id":"f-1","file_id":"file-9","severity":"high","status":"open","scan_source":"wardline","rule_id":"WLN-TAINT-001","message":"sink","suggestion":"","scan_run_id":"r-1","line_start":12,"line_end":12,"fingerprint":"fp","issue_id":null,"seen_count":1,"metadata":{"wardline":{"qualname":"demo.Foo.bar"}},"data_warnings":[]}],"has_more":false}"#; - write!( - s2, - "HTTP/1.1 200 OK\r\ncontent-length: {}\r\n\r\n{}", - body.len(), - body - ) - .unwrap(); - }); - // Not detail_test_client(addr): the two-hop test does two sequential TCP - // accepts, so use a more generous timeout to avoid CI scheduling jitter - // between hops. - let config = FiligreeConfig { - enabled: true, - base_url: format!("http://{addr}"), - actor: "clarion-test".to_owned(), - token_env: "TEST_FILIGREE_TOKEN".to_owned(), - timeout_seconds: 5, - emit_findings: true, - prune_unseen_days: 30, - }; - let client = FiligreeHttpClient::from_config(&config, |_| None) - .expect("build client") - .expect("enabled client"); - let findings = client - .wardline_findings_for_path("src/demo.py") - .expect("two-hop fetch"); - assert_eq!(findings.len(), 1); - assert_eq!(findings[0].rule_id, "WLN-TAINT-001"); - handle.join().expect("server thread"); - } - - /// FIX 3: when hop-1 returns items that don't include the exact path AND - /// `has_more` is true, `wardline_findings_for_path` must return `Err` rather - /// than `Ok(empty)` — a truncated page is indeterminate, not "no file found". - #[test] - fn wardline_findings_for_path_errors_when_hop1_truncated_before_exact_match() { - let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); - let addr = listener.local_addr().expect("local addr"); - let handle = std::thread::spawn(move || { - // Hop 1: page does NOT contain src/demo.py but has_more is true — - // the exact path may be on a later page. - let (mut s1, _) = listener.accept().expect("accept files"); - let mut buf = [0_u8; 4096]; - let n = s1.read(&mut buf).expect("read files req"); - let req = String::from_utf8_lossy(&buf[..n]); - assert!(req.contains( - "GET /api/loom/files?scan_source=wardline&path_prefix=src%2Fdemo.py HTTP/1.1" - )); - // Return a page that omits the target path with has_more:true. - let body = r#"{"items":[{"file_id":"file-1","path":"src/demo_other.py","language":"python","file_type":"source"}],"has_more":true}"#; - write!( - s1, - "HTTP/1.1 200 OK\r\nconnection: close\r\ncontent-length: {}\r\n\r\n{}", - body.len(), - body - ) - .unwrap(); - // No hop-2 — the function must error before making a second request. - }); - let config = FiligreeConfig { - enabled: true, - base_url: format!("http://{addr}"), - actor: "clarion-test".to_owned(), - token_env: "TEST_FILIGREE_TOKEN".to_owned(), - timeout_seconds: 5, - emit_findings: true, - prune_unseen_days: 30, - }; - let client = FiligreeHttpClient::from_config(&config, |_| None) - .expect("build client") - .expect("enabled client"); - let result = client.wardline_findings_for_path("src/demo.py"); - handle.join().expect("server thread"); - assert!( - result.is_err(), - "truncated hop-1 without exact match must be Err, not Ok: {result:?}" - ); - } - - /// Hop-2 counterpart to the hop-1 truncation test: when the findings page - /// for the resolved `file_id` reports `has_more: true`, the first page is an - /// incomplete view, so `wardline_findings_for_path` must return `Err` - /// (degrades to `unavailable`) rather than `Ok(partial)` — no silent - /// undercount. - #[test] - fn wardline_findings_for_path_errors_when_hop2_truncated() { - let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); - let addr = listener.local_addr().expect("local addr"); - let handle = std::thread::spawn(move || { - // Hop 1: exact path resolves to file-9 on a complete page. - let (mut s1, _) = listener.accept().expect("accept files"); - let mut buf = [0_u8; 4096]; - let n = s1.read(&mut buf).expect("read files req"); - let req = String::from_utf8_lossy(&buf[..n]); - assert!(req.contains( - "GET /api/loom/files?scan_source=wardline&path_prefix=src%2Fdemo.py HTTP/1.1" - )); - let body = r#"{"items":[{"file_id":"file-9","path":"src/demo.py","language":"python","file_type":"source"}],"has_more":false}"#; - write!( - s1, - "HTTP/1.1 200 OK\r\nconnection: close\r\ncontent-length: {}\r\n\r\n{}", - body.len(), - body - ) - .unwrap(); - - // Hop 2: findings page for file-9 is truncated (has_more:true). - let (mut s2, _) = listener.accept().expect("accept findings"); - let n = s2.read(&mut buf).expect("read findings req"); - let req = String::from_utf8_lossy(&buf[..n]); - assert!( - req.contains("GET /api/loom/findings?scan_source=wardline&file_id=file-9 HTTP/1.1") - ); - let body = r#"{"items":[{"finding_id":"f-1","file_id":"file-9","severity":"high","status":"open","scan_source":"wardline","rule_id":"WLN-TAINT-001","message":"sink","suggestion":"","scan_run_id":"r-1","line_start":12,"line_end":12,"fingerprint":"fp","issue_id":null,"seen_count":1,"metadata":{"wardline":{"qualname":"demo.Foo.bar"}},"data_warnings":[]}],"has_more":true}"#; - write!( - s2, - "HTTP/1.1 200 OK\r\ncontent-length: {}\r\n\r\n{}", - body.len(), - body - ) - .unwrap(); - }); - let config = FiligreeConfig { - enabled: true, - base_url: format!("http://{addr}"), - actor: "clarion-test".to_owned(), - token_env: "TEST_FILIGREE_TOKEN".to_owned(), - timeout_seconds: 5, - emit_findings: true, - prune_unseen_days: 30, - }; - let client = FiligreeHttpClient::from_config(&config, |_| None) - .expect("build client") - .expect("enabled client"); - let result = client.wardline_findings_for_path("src/demo.py"); - handle.join().expect("server thread"); - assert!( - result.is_err(), - "truncated hop-2 findings page must be Err, not Ok(partial): {result:?}" - ); - } - - fn detail_test_client(addr: std::net::SocketAddr) -> FiligreeHttpClient { - let config = FiligreeConfig { - enabled: true, - base_url: format!("http://{addr}"), - actor: "clarion-test".to_owned(), - token_env: "TEST_FILIGREE_TOKEN".to_owned(), - timeout_seconds: 1, - emit_findings: true, - prune_unseen_days: 30, - }; - FiligreeHttpClient::from_config(&config, |_| None) - .expect("build client") - .expect("enabled client") - } -} +pub use clarion_federation::filigree::*; diff --git a/crates/clarion-mcp/src/filigree_url.rs b/crates/clarion-mcp/src/filigree_url.rs index a8e90cbb..5fd6e18b 100644 --- a/crates/clarion-mcp/src/filigree_url.rs +++ b/crates/clarion-mcp/src/filigree_url.rs @@ -1,212 +1 @@ -//! Resolve the live Filigree API base URL. -//! -//! Mirrors Filigree's ethereal endpoint-discovery convention: the dashboard -//! publishes its live port to `/.filigree/ephemeral.port` (a plain -//! integer, written atomically, present only while the dashboard runs) and -//! serves the read API on that port. The port is chosen deterministically but -//! unpredictably (`8400 + sha256(path) % 1000` with fallback), so it must be -//! *read*, never computed. This mirrors the Filigree sources: -//! - `filigree/src/filigree/ephemeral.py::{write,read}_port_file` -//! - `filigree/src/filigree/scanner_callback.py::resolve_scanner_api_url_with_source` -//! -//! Federation discipline (`docs/suite/loom.md` §5): this is enrich-only -//! connection discovery. Clarion stays solo-useful — when no live port file is -//! present (or Filigree is disabled) Clarion falls back to its *own* configured -//! `base_url`, never to a Filigree-internal default (copying Filigree's -//! `DEFAULT_PORT` would be a silent cross-product coupling). Reading the port -//! file is fail-soft: any missing/corrupt/out-of-range content degrades to the -//! configured URL. -//! -//! Scope: ethereal mode only. Filigree's `server` mode resolves through a -//! home-directory global (`~/.config/filigree/server.json`); that path is not -//! exercised here and is left as a known gap (clarion-318f1254eb tracks the -//! issues_for-side resolution diagnostics that build on this resolver). - -use std::path::Path; - -use serde::Serialize; - -use crate::config::FiligreeConfig; - -/// Wire-facing `source` labels for a resolved Filigree URL. Reported verbatim -/// by `project_status` (and, per clarion-318f1254eb, `issues_for`) so an agent -/// can tell *where* the URL came from without shelling out to probe ports. -pub const SOURCE_DISABLED: &str = "disabled"; -/// The live ethereal port published by Filigree's running dashboard. -pub const SOURCE_EPHEMERAL_PORT: &str = ".filigree/ephemeral.port"; -/// Clarion's own configured `integrations.filigree.base_url`. -pub const SOURCE_CONFIG: &str = "config"; - -/// The outcome of resolving where Clarion should reach Filigree's read API. -#[derive(Debug, Clone, PartialEq, Eq, Serialize)] -pub struct FiligreeUrlResolution { - /// Whether the Filigree integration is enabled in config at all. - pub enabled: bool, - /// The statically configured base URL (`integrations.filigree.base_url`). - pub configured_url: String, - /// The URL Clarion will actually call. `None` only when disabled. - pub resolved_url: Option, - /// Which input produced [`Self::resolved_url`]; one of the `SOURCE_*` labels. - pub source: &'static str, -} - -/// Resolve the Filigree read-API base URL, preferring the live ethereal port. -/// -/// - Disabled → no resolved URL, `source = "disabled"`. -/// - A valid `/.filigree/ephemeral.port` → the configured URL -/// with its port overridden by the live port, `source = ".filigree/ephemeral.port"`. -/// - Otherwise → the configured URL unchanged, `source = "config"`. -#[must_use] -pub fn resolve_filigree_url(config: &FiligreeConfig, project_root: &Path) -> FiligreeUrlResolution { - let configured_url = config.base_url.clone(); - if !config.enabled { - return FiligreeUrlResolution { - enabled: false, - configured_url, - resolved_url: None, - source: SOURCE_DISABLED, - }; - } - match read_ephemeral_port(project_root) { - Some(port) => { - let resolved = override_port(&configured_url, port); - FiligreeUrlResolution { - enabled: true, - configured_url, - resolved_url: Some(resolved), - source: SOURCE_EPHEMERAL_PORT, - } - } - None => FiligreeUrlResolution { - enabled: true, - resolved_url: Some(configured_url.clone()), - configured_url, - source: SOURCE_CONFIG, - }, - } -} - -/// Read `/.filigree/ephemeral.port` as a TCP port. -/// -/// Mirrors Filigree's `read_port_file`: a plain trimmed integer. Any -/// missing/corrupt/out-of-range/zero content folds to `None` (fail-soft). -fn read_ephemeral_port(project_root: &Path) -> Option { - let path = project_root.join(".filigree").join("ephemeral.port"); - let raw = std::fs::read_to_string(&path).ok()?; - raw.trim().parse::().ok().filter(|port| *port != 0) -} - -/// Replace the port in a `scheme://host[:port][/path]` URL, preserving the -/// scheme, host, and any trailing path. Returns the input unchanged when it -/// has no recognizable `scheme://` authority. IPv6 literal hosts are out of -/// scope — Filigree binds `127.0.0.1`. -fn override_port(base_url: &str, port: u16) -> String { - let Some((scheme, rest)) = base_url.split_once("://") else { - return base_url.to_owned(); - }; - let (authority, path) = match rest.find('/') { - Some(slash) => (&rest[..slash], &rest[slash..]), - None => (rest, ""), - }; - // Strip an existing `:port` suffix, but only when it is genuinely a numeric - // port (so a bare `host` with no port is preserved intact). - let host = match authority.rsplit_once(':') { - Some((host, maybe_port)) - if !maybe_port.is_empty() && maybe_port.bytes().all(|b| b.is_ascii_digit()) => - { - host - } - _ => authority, - }; - format!("{scheme}://{host}:{port}{path}") -} - -#[cfg(test)] -mod tests { - use super::*; - - fn enabled_config() -> FiligreeConfig { - FiligreeConfig { - enabled: true, - ..FiligreeConfig::default() - } - } - - fn write_port_file(root: &Path, contents: &str) { - let dir = root.join(".filigree"); - std::fs::create_dir_all(&dir).unwrap(); - std::fs::write(dir.join("ephemeral.port"), contents).unwrap(); - } - - #[test] - fn disabled_integration_resolves_nothing() { - let dir = tempfile::tempdir().unwrap(); - let config = FiligreeConfig::default(); // enabled: false - let res = resolve_filigree_url(&config, dir.path()); - assert!(!res.enabled); - assert_eq!(res.resolved_url, None); - assert_eq!(res.source, SOURCE_DISABLED); - assert_eq!(res.configured_url, "http://127.0.0.1:8766"); - } - - #[test] - fn live_ephemeral_port_overrides_the_stale_configured_port() { - // The dogfood bug: configured 8766 is dead; the live dashboard is on - // 8542 per .filigree/ephemeral.port. - let dir = tempfile::tempdir().unwrap(); - write_port_file(dir.path(), "8542\n"); - let res = resolve_filigree_url(&enabled_config(), dir.path()); - assert!(res.enabled); - assert_eq!(res.resolved_url.as_deref(), Some("http://127.0.0.1:8542")); - assert_eq!(res.source, SOURCE_EPHEMERAL_PORT); - // The configured URL is still reported verbatim alongside the resolved one. - assert_eq!(res.configured_url, "http://127.0.0.1:8766"); - } - - #[test] - fn falls_back_to_configured_url_when_no_port_file() { - let dir = tempfile::tempdir().unwrap(); - let res = resolve_filigree_url(&enabled_config(), dir.path()); - assert!(res.enabled); - assert_eq!(res.resolved_url.as_deref(), Some("http://127.0.0.1:8766")); - assert_eq!(res.source, SOURCE_CONFIG); - } - - #[test] - fn corrupt_port_file_folds_to_configured_url() { - let dir = tempfile::tempdir().unwrap(); - write_port_file(dir.path(), "not-a-port"); - let res = resolve_filigree_url(&enabled_config(), dir.path()); - assert_eq!(res.source, SOURCE_CONFIG); - assert_eq!(res.resolved_url.as_deref(), Some("http://127.0.0.1:8766")); - } - - #[test] - fn zero_port_is_rejected_as_corrupt() { - let dir = tempfile::tempdir().unwrap(); - write_port_file(dir.path(), "0"); - let res = resolve_filigree_url(&enabled_config(), dir.path()); - assert_eq!(res.source, SOURCE_CONFIG); - } - - #[test] - fn override_port_preserves_scheme_host_and_path() { - assert_eq!( - override_port("http://127.0.0.1:8766", 8542), - "http://127.0.0.1:8542" - ); - assert_eq!( - override_port("http://localhost", 8542), - "http://localhost:8542" - ); - assert_eq!( - override_port("https://example.test:1/api", 8542), - "https://example.test:8542/api" - ); - } - - #[test] - fn override_port_returns_input_without_scheme() { - assert_eq!(override_port("127.0.0.1:8766", 8542), "127.0.0.1:8766"); - } -} +pub use clarion_federation::filigree_url::*; diff --git a/crates/clarion-mcp/src/lib.rs b/crates/clarion-mcp/src/lib.rs index e82421d6..4abbf909 100644 --- a/crates/clarion-mcp/src/lib.rs +++ b/crates/clarion-mcp/src/lib.rs @@ -19,6 +19,7 @@ use clarion_core::{ EdgeConfidence, EmbeddingProvider, LlmProvider, LlmProviderError, LlmRequest, LlmResponse, McpErrorCode, }; +use rusqlite::{Connection, OpenFlags}; use serde::{Deserialize, Serialize}; use serde_json::{Value, json}; use thiserror::Error; @@ -37,7 +38,13 @@ use clarion_storage::{ }; use crate::config::{LlmConfig, SemanticSearchConfig}; -use crate::filigree::{EntityAssociation, EntityAssociationsResponse, FiligreeLookup, IssueDetail}; +use crate::filigree::{ + EntityAssociation, EntityAssociationsResponse, FiligreeLookup, IssueDetail, + ObservationCreateRequest, +}; +use clarion_storage::{ + GuidanceProposal, GuidanceSheetInput, invalidate_summaries_for_sheet, upsert_guidance_sheet, +}; /// MCP protocol revision supported by the B.6 stdio server. pub const MCP_PROTOCOL_VERSION: &str = "2025-11-25"; @@ -69,12 +76,12 @@ instead of re-reading or grepping the tree. Entity IDs are `{{plugin}}:{{kind}}:{{qualified_name}}` (e.g. \ `python:function:pkg.mod.func`); subsystems are `core:subsystem:{{hash}}`. You \ -almost never type IDs — get one from `find_entity` or `entity_at`, then copy it \ +almost never type IDs — get one from `entity_find` or `entity_at`, then copy it \ verbatim into the next tool. -Tools: {tool_names}. `callers_of` / `neighborhood` / `execution_paths_from` \ +Tools: {tool_names}. `entity_callers_list` / `entity_neighborhood_get` / `entity_execution_path_list` \ take a `confidence` tier (resolved | ambiguous | inferred; default resolved). \ -`project_status` reports index freshness, counts, LLM policy, and the resolved \ +`project_status_get` reports index freshness, counts, LLM policy, and the resolved \ Filigree endpoint. For the full workflow see the clarion-workflow skill (installed by \ @@ -86,6 +93,57 @@ project counts and index freshness are in the `clarion://context` resource." type InferredInflight = Arc>>>; +pub const RENAME_MAP: &[(&str, &str)] = &[ + ("entity_at", "entity_at"), + ("find_entity", "entity_find"), + ("callers_of", "entity_callers_list"), + ("execution_paths_from", "entity_execution_path_list"), + ("summary", "entity_summary_get"), + ("issues_for", "entity_issue_list"), + ("neighborhood", "entity_neighborhood_get"), + ("subsystem_members", "subsystem_member_list"), + ("subsystem_of", "entity_subsystem_get"), + ("project_status", "project_status_get"), + ("summary_preview_cost", "entity_summary_preview_cost_get"), + ("source_for_entity", "entity_source_get"), + ("call_sites", "entity_call_site_list"), + ("orientation_pack", "entity_orientation_pack_get"), + ("analyze_start", "analyze_start"), + ("analyze_status", "analyze_status_get"), + ("analyze_cancel", "analyze_cancel"), + ("index_diff", "index_diff_get"), + ("guidance_for", "entity_guidance_list"), + ("propose_guidance", "propose_guidance"), + ("promote_guidance", "promote_guidance"), + ("findings_for", "entity_finding_list"), + ("wardline_for", "entity_wardline_get"), + ("find_by_tag", "entity_tag_list"), + ("find_by_kind", "entity_kind_list"), + ("find_by_wardline", "entity_wardline_list"), + ("find_circular_imports", "module_circular_import_list"), + ("find_coupling_hotspots", "entity_coupling_hotspot_list"), + ("find_entry_points", "entity_entry_point_list"), + ("find_http_routes", "entity_http_route_list"), + ("find_data_models", "entity_data_model_list"), + ("find_tests", "entity_test_list"), + ("find_deprecations", "entity_deprecation_list"), + ("find_todos", "entity_todo_list"), + ("what_tests_this", "entity_test_caller_list"), + ("high_churn", "entity_high_churn_list"), + ("recently_changed", "entity_recent_change_list"), + ("find_dead_code", "entity_dead_list"), + ("search_semantic", "entity_semantic_search_list"), +]; + +pub fn rename_old_to_new(name: &str) -> &str { + for &(old, new) in RENAME_MAP { + if name == old { + return new; + } + } + name +} + #[derive(Debug, Clone, PartialEq, Eq, Serialize)] pub struct ToolDefinition { pub name: &'static str, @@ -113,7 +171,7 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "find_entity", + name: "entity_find", description: "Search Clarion entities by id, name, short name, and summary text stored on entity rows. Results are paginated and ranked by FTS match where possible. This does not traverse the graph and does not search on-demand summary_cache entries. Pass an optional `kind` (e.g. \"subsystem\", \"function\", \"class\", \"module\") to return only entities of that kind — the way to locate a subsystem without visually filtering results.", input_schema: json!({ "type": "object", @@ -128,12 +186,12 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "callers_of", + name: "entity_callers_list", description: "Return entities that call the given entity. Default confidence is resolved, so ambiguous static candidates and LLM-inferred edges are excluded unless explicitly requested. Ambiguous edges expand all candidates; inferred edges may trigger bounded LLM dispatch. The result carries scope_excludes naming static blind spots not searched (e.g. attribute-receiver-calls) so an empty callers list is never read as a guaranteed true negative.", input_schema: id_confidence_schema(), }, ToolDefinition { - name: "execution_paths_from", + name: "entity_execution_path_list", description: "Return bounded calls-only execution paths starting at an entity. Default confidence is resolved. max_depth defaults to 3. Results are compact: a deduplicated nodes table plus paths as arrays of node ids (under a root), ranked longest-first. Traversal stops at the server edge cap and the response is capped at a maximum number of ranked paths; truncated/truncation_reason report edge-cap or path-cap when either trims. The result carries scope_excludes naming static blind spots not searched (e.g. attribute-receiver-calls).", input_schema: json!({ "type": "object", @@ -147,12 +205,12 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "summary", + name: "entity_summary_get", description: "Return an on-demand cached summary for one entity. In v0.1 this is leaf scope only: module summaries describe the module docstring and top-level members, not an aggregation of contained function/class summaries. If the LLM returns non-JSON the response degrades to a deterministic structural summary (kind: structural-fallback) built from the entity source, and that fallback is cached so a retry is a free cache hit rather than a re-billed failure.", input_schema: id_schema(), }, ToolDefinition { - name: "issues_for", + name: "entity_issue_list", description: "Return Filigree issues attached to this Clarion entity, optionally including issues attached to contained entities. Filigree is an enrichment source; if unavailable, the tool returns an unavailable envelope instead of failing Clarion. The result carries a result_kind (matched | no_matches | unavailable) so a reachable-but-empty Filigree is distinct from an unreachable one, and a filigree_endpoint block (configured vs resolved URL + resolution_source) so you can see which endpoint — e.g. a live ethereal port — the answer came from. Each matched/drifted entry carries an `issue` object with the issue's title, status, and priority (fetched once per distinct issue, no N+1); `issue` is null when the issue-detail route is unavailable, so the match still resolves without a second hop into Filigree. Includes a `wardline_findings` section (enrich-only) reconciling Wardline findings to the entity by qualname; `result_kind` is matched|no_matches|unavailable.", input_schema: json!({ "type": "object", @@ -165,22 +223,22 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "neighborhood", + name: "entity_neighborhood_get", description: "Return the one-hop Clarion neighborhood around an entity: callers, callees, container, contained entities, references, and imports (imports_in = who imports this module, imports_out = what it imports; module-to-module). Default confidence is resolved; ambiguous and inferred calls are opt-in. References and imports are not execution flow. When the entity is a module, references_in/references_out are rolled up over the symbols it contains (references_rolled_up=true) — each neighbor carries a `via` naming the contained symbol the edge touches, so \"who imports this module/contract\" is answered at module altitude rather than reading empty. On references_in each rolled-up neighbor also carries `importer_module` — the importing symbol's containing module — so reverse-import names importing modules, not just symbols. The result carries scope_excludes naming blind spots not searched (e.g. attribute-receiver-calls) so empty sections are never read as guaranteed true negatives.", input_schema: id_confidence_schema(), }, ToolDefinition { - name: "subsystem_members", + name: "subsystem_member_list", description: "List module entities assigned to a subsystem entity.", input_schema: id_schema(), }, ToolDefinition { - name: "subsystem_of", + name: "entity_subsystem_get", description: "Return the subsystem an entity belongs to — the reverse of subsystem_members. Accepts any entity id: a module resolves directly, while a function/class resolves through its nearest containing module. Returns the subsystem id/name and the module the membership was resolved through, or a no-subsystem result when the entity has no subsystem-assigned module ancestor.", input_schema: id_schema(), }, ToolDefinition { - name: "project_status", + name: "project_status_get", description: "Return deterministic Clarion diagnostics: repo root, db path, latest run (id/status/started/completed), entity/subsystem/edge/finding/briefing-blocked counts, index staleness, per-plugin entity counts from the current index, LLM policy (provider/live/cache), and the resolved Filigree endpoint (configured vs resolved URL + resolution source). Answers \"is the graph fresh, plugin-less, LLM-live, Filigree-reachable?\" without shelling out. No LLM call.", input_schema: json!({ "type": "object", @@ -189,12 +247,12 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "summary_preview_cost", + name: "entity_summary_preview_cost_get", description: "Preview what calling summary(id) would cost BEFORE spending. Reports cache_status (hit | expired | miss), the cached row's real tokens/cost/age on a hit, an input-token estimate on a miss, the configured model, the LLM policy (provider/live/allow_live_provider/cache horizon), and live_spend_would_occur — true only when no fresh cache row exists AND a live provider is wired. A disabled/unconfigured LLM is reported distinctly from a cache miss. Never invokes the LLM provider.", input_schema: id_schema(), }, ToolDefinition { - name: "source_for_entity", + name: "entity_source_get", description: "Return the exact indexed source span for one entity (its source_line_start..source_line_end, which includes any decorators/signature/docstring the plugin captured) plus a bounded window of surrounding context, as line-numbered lines each flagged in_entity true/false. No LLM call. Lets an agent read and trust the entity without shelling out. source_status reports `ok`, or — instead of a misleading stale snippet — `missing` (file gone), `no_range`/`no_source_path` (entity has no anchor), `binary` (non-UTF-8), or `drifted` (the file no longer matches the indexed content_hash; rerun `clarion analyze`). context_lines defaults to 10.", input_schema: json!({ "type": "object", @@ -207,7 +265,7 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "call_sites", + name: "entity_call_site_list", description: "Show the actual source sites behind calls/references edges, so an agent can see WHY Clarion believes an edge exists rather than trusting it blind. role=caller (default) returns this entity's outgoing sites (what it calls/references); role=callee returns incoming sites (who calls/references it). Each site carries the file path, 1-based line, byte column, the source line text, edge kind, confidence, and a resolution of resolved | ambiguous (with candidate ids) | unresolved (a static call Clarion could not bind, kept separate so it is never mixed with resolved evidence). Filter by edge kind (`calls`/`references`) and by a best-effort production/test path heuristic (`all`/`production`/`test`; path partitioning is not indexed — the heuristic matches conventional test paths). Output is bounded; truncated flags when the site cap trims. No LLM call.", input_schema: json!({ "type": "object", @@ -223,7 +281,7 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "orientation_pack", + name: "entity_orientation_pack_get", description: "Assemble one deterministic orientation packet for a code location — the replacement for hand-composing find_entity + entity_at + source reads + neighborhood + issues_for + freshness on every question. Resolve EITHER by `entity` id OR by `file`+`line` (exactly one form). The packet bundles: the primary entity, the entity_context evidence (match_reason / containing stack / decl-body-decorator ranges — so a decorator-line query is explained, not guessed), a compact source-span summary, one-hop neighbors (callers, callees, container, contained, references, imports — for a module, references_in/out are rolled up over contained symbols with references_rolled_up=true), compact resolved execution paths, related Filigree issues, index/Filigree/LLM health, warnings, and suggested next reads. No LLM summary is invoked. Every list is bounded; an `omitted` block reports per-section truncation counts and `degraded` sections name surfaces that were unavailable (e.g. Filigree down) so an empty section is never read as a guaranteed negative. Includes a `wardline_findings` section (enrich-only) reconciling Wardline findings to the entity by qualname; `result_kind` is matched|no_matches|unavailable.", input_schema: json!({ "type": "object", @@ -245,7 +303,7 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "analyze_status", + name: "analyze_status_get", description: "Report the live status of an analyze run started via analyze_start. status is one of queued (spawned, not yet recording) | running | completed | failed | cancelled | skipped_no_plugins. While running it exposes phase (discovering / analyzing / clustering), current_plugin, processed_files / total_files, current_file, the latest heartbeat_at, elapsed_seconds, and progress_observed (false when the heartbeat has gone stale — the run may be wedged). On a terminal status it carries the recorded run stats. Reads structured progress, never logs.", input_schema: json!({ "type": "object", @@ -269,7 +327,7 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "index_diff", + name: "index_diff_get", description: "Report what changed since the last analyze and whether this checkout is newer than the graph — so an agent need not hand-roll git + mtime freshness checks. Compares: analyzed_at (last completed run) vs current git HEAD (with head_newer_than_analyze derived from HEAD's committer date vs run completion, true even when source mtimes are ambiguous); indexed source files modified or now-missing since analyze; dirty working-tree files flagged when they touch an indexed path; and per-run aggregate plugin skip/drop counters. Git is read at query time, read-only, and fail-soft: a missing git binary or non-repo dir degrades to git.available=false with a reason rather than failing. analyzed_commit is null by design (Clarion persists no analyze-time SHA). overall is fresh | drift | unknown | never_analyzed; lists are bounded with an `omitted` block. entity-level add/remove/change diff is unavailable in v0.1 (only the current graph is retained). No LLM call.", input_schema: json!({ "type": "object", @@ -280,7 +338,7 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "guidance_for", + name: "entity_guidance_list", description: "Return the guidance sheets applicable to one entity, composed at query time and ranked by scope_rank (project → subsystem → package → module → class → function), ties broken by authored_at then id. Read-only: this surfaces composed institutional knowledge; authoring (propose/promote) is a separate lifecycle. A sheet applies via an explicit `guides` edge OR a `match_rules` entry resolved against the entity (path glob / tag / kind / subsystem / entity). `wardline_group` rules are not evaluated here (the Wardline blob is opaque) and are reported in `notes`, never guessed. Expired sheets are excluded. Each sheet carries its `sei`. Bounded (limit/offset, page.total/truncated). Honest-empty when no sheet applies. No LLM call.", input_schema: json!({ "type": "object", @@ -294,7 +352,41 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "findings_for", + name: "propose_guidance", + description: "Propose a guidance sheet for operator review by creating a Filigree observation. This is deliberately inert: it does not write a Clarion guidance entity and cannot enter summaries until `promote_guidance` or `clarion guidance promote` consumes the observation.", + input_schema: json!({ + "type": "object", + "properties": { + "entity_id": {"type": "string", "minLength": 1}, + "content": {"type": "string", "minLength": 1}, + "scope_level": { + "type": "string", + "enum": ["project", "subsystem", "package", "module", "class", "function"], + "default": "function" + }, + "match_rules": {"type": "array", "items": {"type": "object"}}, + "name": {"type": "string", "minLength": 1}, + "pinned": {"type": "boolean", "default": false}, + "expires": {"type": "string", "minLength": 1} + }, + "required": ["entity_id", "content"], + "additionalProperties": false + }), + }, + ToolDefinition { + name: "promote_guidance", + description: "Promote a reviewed Filigree observation produced by `propose_guidance` into a local Clarion guidance sheet. This operator action is the anti-poisoning boundary: only promoted observations become prompt-composed guidance.", + input_schema: json!({ + "type": "object", + "properties": { + "observation_id": {"type": "string", "minLength": 1} + }, + "required": ["observation_id"], + "additionalProperties": false + }), + }, + ToolDefinition { + name: "entity_finding_list", description: "Return findings anchored to one entity, optionally filtered by `filter.kind` (defect/fact/classification/metric/suggestion), `filter.severity` (INFO/WARN/ERROR/CRITICAL/NONE), and `filter.status` (open/acknowledged/suppressed/promoted_to_issue). The queried entity carries its `sei`; each finding's `related_entities` are raw locator ids (references, not the primary return). Bounded (limit/offset, page.total/truncated). An entity with no findings returns an empty list, not an error. No LLM call.", input_schema: json!({ "type": "object", @@ -317,27 +409,27 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "wardline_for", + name: "entity_wardline_get", description: "Return the Wardline metadata recorded for one entity (declared tier, groups, boundary contracts) returned VERBATIM — the `wardline_json` blob is opaque to Clarion. result_kind is `present` when a taint fact exists, else `no_facts` with a missing-signal note: facts are populated via Filigree Flow-B (POST /api/wardline/taint-facts), so a locally-empty result is honest, not an error. The entity carries its `sei`. No LLM call.", input_schema: id_schema(), }, ToolDefinition { - name: "find_by_tag", - description: "Return entities carrying a plugin-emitted categorisation `tag`, within an optional `scope` (an entity id → its descendants, OR a path glob like \"src/auth/**\"; omitted → whole project). Bounded (limit/offset, page.total/truncated; scope_truncated/scan_truncated flag cap hits). Entities carry their `sei`. Honest-empty with a missing-signal note when no entity carries the tag — the Python plugin emits no categorisation tags today. No LLM call.", + name: "entity_tag_list", + description: "Return entities carrying a plugin-emitted categorisation `tag`, within an optional `scope` (an entity id → its descendants, OR a path glob like \"src/auth/**\"; omitted → whole project). Bounded (limit/offset, page.total/truncated; scope_truncated/scan_truncated flag cap hits). Entities carry their `sei`. Honest-empty with a missing-signal note when no entity in the current index carries the tag. No LLM call.", input_schema: scope_facet_schema(&[("tag", true)]), }, ToolDefinition { - name: "find_by_kind", + name: "entity_kind_list", description: "Return entities of a plugin-declared `kind` (e.g. \"function\", \"class\", \"module\"), within an optional `scope` (entity id → descendants, OR path glob; omitted → whole project). Bounded (limit/offset, page.total/truncated). Entities carry their `sei`. An unknown kind matches no rows. No LLM call.", input_schema: scope_facet_schema(&[("kind", true)]), }, ToolDefinition { - name: "find_by_wardline", + name: "entity_wardline_list", description: "Return entities carrying a Wardline taint fact, optionally filtered by `tier` and/or `group`, within an optional `scope` (entity id → descendants, OR path glob; omitted → whole project). The Wardline blob is opaque to Clarion: tier/group filtering is best-effort against a top-level field on the blob and honest-empty when absent. Each entity carries its `wardline` blob verbatim plus its `sei`. Bounded (limit/offset, page.total/truncated). Facts are populated via Filigree Flow-B. No LLM call.", input_schema: scope_facet_schema(&[("tier", false), ("group", false)]), }, ToolDefinition { - name: "find_circular_imports", + name: "module_circular_import_list", description: "Return import cycles in the module import graph (`imports` edges) — each a strongly-connected component of size > 1 (or a self-import), members sorted. On-demand graph query (no analyze-time precompute). Edge-derived: default `confidence` is resolved (the tier is a ceiling — resolved → resolved only, inferred → all) and is echoed in the result. Optional `scope` (entity id → descendants, OR path glob) restricts to cycles whose members are all in scope. Bounded (limit/offset, page.total/truncated). Each member carries its `sei`. No LLM call.", input_schema: json!({ "type": "object", @@ -351,7 +443,7 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "find_coupling_hotspots", + name: "entity_coupling_hotspot_list", description: "Return entities ranked by coupling (distinct fan-in + fan-out over the edge graph), most-coupled first. On-demand graph query (no analyze-time precompute). Edge-derived: default `confidence` is resolved (a ceiling) and is echoed. Optional `scope` (entity id → descendants, OR path glob; omitted → whole project). Bounded (limit default 20, max 200; page.total/truncated). Each entity carries its `sei`. No LLM call.", input_schema: json!({ "type": "object", @@ -365,37 +457,37 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "find_entry_points", - description: "Return entities tagged as entry points, within an optional `scope` (entity id → descendants, OR path glob). Reads the `entry-point` categorisation tag. HONEST-EMPTY: the active plugins emit no entry-point categorisation today, so an empty result means the signal is absent (a missing-signal note says so), NOT that there are no entry points. Bounded; SEI-carrying. No LLM call.", + name: "entity_entry_point_list", + description: "Return entities tagged as entry points, within an optional `scope` (entity id → descendants, OR path glob). Reads the `entry-point` categorisation tag. HONEST-EMPTY when no entity in the current index carries the tag, so an empty result means the signal is absent, NOT that there are no entry points. Bounded; SEI-carrying. No LLM call.", input_schema: scope_page_schema(false), }, ToolDefinition { - name: "find_http_routes", + name: "entity_http_route_list", description: "Return entities tagged as HTTP routes, within an optional `scope`. Reads the `http-route` categorisation tag. HONEST-EMPTY when route categorisation is not emitted (missing-signal note). Bounded; SEI-carrying. No LLM call.", input_schema: scope_page_schema(false), }, ToolDefinition { - name: "find_data_models", + name: "entity_data_model_list", description: "Return entities tagged as data models, within an optional `scope`. Reads the `data-model` categorisation tag. HONEST-EMPTY when data-model categorisation is not emitted (missing-signal note). Bounded; SEI-carrying. No LLM call.", input_schema: scope_page_schema(false), }, ToolDefinition { - name: "find_tests", + name: "entity_test_list", description: "Return entities tagged as tests, within an optional `scope`. Reads the `test` categorisation tag. HONEST-EMPTY when test categorisation is not emitted (missing-signal note). Bounded; SEI-carrying. No LLM call.", input_schema: scope_page_schema(false), }, ToolDefinition { - name: "find_deprecations", + name: "entity_deprecation_list", description: "Return entities tagged deprecated, within an optional `scope`. Reads the `deprecated` categorisation tag. HONEST-EMPTY when deprecation categorisation is not emitted (missing-signal note). Bounded; SEI-carrying. No LLM call.", input_schema: scope_page_schema(false), }, ToolDefinition { - name: "find_todos", + name: "entity_todo_list", description: "Return entities carrying a TODO/FIXME marker, within an optional `scope`. Reads the `todo` categorisation tag. HONEST-EMPTY when TODO extraction is not emitted (missing-signal note). Bounded; SEI-carrying. No LLM call.", input_schema: scope_page_schema(false), }, ToolDefinition { - name: "what_tests_this", + name: "entity_test_caller_list", description: "Return the test entities that exercise an entity — its callers carrying the `test` categorisation tag. HONEST-EMPTY when test categorisation is not emitted, so an empty result is NOT a guarantee the entity is untested (a missing-signal note says so). Bounded; tests carry their `sei`. No LLM call.", input_schema: json!({ "type": "object", @@ -409,22 +501,22 @@ pub fn list_tools() -> Vec { }), }, ToolDefinition { - name: "high_churn", + name: "entity_high_churn_list", description: "Return entities ranked by git churn (`git_churn_count`) descending, within an optional `scope`. The analyze pipeline does not populate churn in v1.0, so this is HONEST-EMPTY in practice (missing-signal note); the query is real and lights up if churn is ever populated. Bounded; SEI-carrying. No LLM call.", input_schema: scope_page_schema(false), }, ToolDefinition { - name: "recently_changed", + name: "entity_recent_change_list", description: "Return entities changed since a timestamp (`since?`), within an optional `scope`. Clarion does not index a per-entity git change timestamp in v1.0, so this is an HONEST NO-OP: it returns an empty set with a missing-signal note pointing at `index_diff` for repo-level freshness (HEAD vs last analyze). Never fabricates a change set. No LLM call.", input_schema: scope_page_schema(true), }, ToolDefinition { - name: "find_dead_code", - description: "Return entities NOT reachable from the root set (entry points ∪ exported API ∪ tests ∪ HTTP routes ∪ CLI commands ∪ data models) over the call+import graph, within an optional `scope`. On-demand graph query (no analyze-time precompute). CONSERVATIVE (fails toward `live`): reachability counts ALL edge confidence tiers (resolved ∪ ambiguous ∪ inferred), dynamic-dispatch/reflection barrier tags force their entities live, and framework-magic kinds are excluded from candidacy — so it under-reports rather than over-reports. No `confidence` argument (a ceiling would only make more code look dead). HONEST SIGNAL-UNAVAILABLE: the active plugins emit no root categorisation today, so the root set is empty and the tool returns zero candidates with a missing-signal note (NOT a flood of false positives, and NOT a guarantee there is no dead code). Heuristic results (CLA-FACT-DEAD-CODE-CANDIDATE, confidence < 1) — never certain. Bounded; SEI-carrying. No LLM call.", + name: "entity_dead_list", + description: "Return entities NOT reachable from the root set (entry points ∪ exported API ∪ tests ∪ HTTP routes ∪ CLI commands ∪ data models) over the call+import graph, within an optional `scope`. On-demand graph query (no analyze-time precompute). CONSERVATIVE (fails toward `live`): reachability counts ALL edge confidence tiers (resolved ∪ ambiguous ∪ inferred), dynamic-dispatch/reflection barrier tags force their entities live, and framework-magic kinds are excluded from candidacy — so it under-reports rather than over-reports. No `confidence` argument (a ceiling would only make more code look dead). HONEST SIGNAL-UNAVAILABLE: if the current index has no root categorisation tags, the tool returns zero candidates with a missing-signal note (NOT a flood of false positives, and NOT a guarantee there is no dead code). Heuristic results (CLA-FACT-DEAD-CODE-CANDIDATE, confidence < 1) — never certain. Bounded; SEI-carrying. No LLM call.", input_schema: scope_page_schema(false), }, ToolDefinition { - name: "search_semantic", + name: "entity_semantic_search_list", description: "Rank entities by semantic (embedding cosine) similarity to a `query` string, within an optional `scope`. OPT-IN: semantic search is OFF by default; when disabled or no embedding provider is configured the tool returns result_kind=`not_enabled` with a missing-signal note (never a faked or empty-as-complete result). When enabled it embeds the query and runs a bounded exact cosine scan over the git-ignored `.clarion/embeddings.db` sidecar (built at analyze time), considering only embeddings whose content_hash matches the entity's current hash (stale vectors never surface). Bounded (limit default 20, max 100; page.total/truncated). Each result carries its `sei` and a `score`.", input_schema: json!({ "type": "object", @@ -747,7 +839,8 @@ impl ServerState { let Some(name) = params.get("name").and_then(Value::as_str) else { return error_response(id, -32602, "invalid tools/call params: missing name"); }; - if !list_tools().iter().any(|tool| tool.name == name) { + let canonical_name = rename_old_to_new(name); + if !list_tools().iter().any(|tool| tool.name == canonical_name) { return error_response(id, -32601, &format!("unknown tool: {name}")); } let arguments = params.get("arguments").unwrap_or(&Value::Null); @@ -759,152 +852,165 @@ impl ServerState { ); }; - let envelope = match name { + let envelope = match canonical_name { "entity_at" => match self.tool_entity_at(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_entity" => match self.tool_find_entity(arguments).await { + "entity_find" => match self.tool_find_entity(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "callers_of" => match self.tool_callers_of(arguments).await { + "entity_callers_list" => match self.tool_callers_of(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "execution_paths_from" => match self.tool_execution_paths_from(arguments).await { + "entity_execution_path_list" => match self.tool_execution_paths_from(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "neighborhood" => match self.tool_neighborhood(arguments).await { + "entity_neighborhood_get" => match self.tool_neighborhood(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "summary" => match self.tool_summary(arguments).await { + "entity_summary_get" => match self.tool_summary(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "issues_for" => match self.tool_issues_for(arguments).await { + "entity_issue_list" => match self.tool_issues_for(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "subsystem_members" => match self.tool_subsystem_members(arguments).await { + "subsystem_member_list" => match self.tool_subsystem_members(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "subsystem_of" => match self.tool_subsystem_of(arguments).await { + "entity_subsystem_get" => match self.tool_subsystem_of(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "project_status" => match self.tool_project_status(arguments).await { + "project_status_get" => match self.tool_project_status(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "summary_preview_cost" => match self.tool_summary_preview_cost(arguments).await { + "entity_summary_preview_cost_get" => { + match self.tool_summary_preview_cost(arguments).await { + Ok(value) => value, + Err(response) => return response.to_json_rpc(id), + } + } + "entity_source_get" => match self.tool_source_for_entity(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "source_for_entity" => match self.tool_source_for_entity(arguments).await { + "entity_call_site_list" => match self.tool_call_sites(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "call_sites" => match self.tool_call_sites(arguments).await { + "entity_orientation_pack_get" => match self.tool_orientation_pack(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "orientation_pack" => match self.tool_orientation_pack(arguments).await { + "analyze_start" => match self.tool_analyze_start(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "analyze_start" => match self.tool_analyze_start(arguments).await { + "analyze_status_get" => match self.tool_analyze_status(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "analyze_status" => match self.tool_analyze_status(arguments).await { + "analyze_cancel" => match self.tool_analyze_cancel(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "analyze_cancel" => match self.tool_analyze_cancel(arguments).await { + "index_diff_get" => match self.tool_index_diff(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "index_diff" => match self.tool_index_diff(arguments).await { + "entity_guidance_list" => match self.tool_guidance_for(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "guidance_for" => match self.tool_guidance_for(arguments).await { + "propose_guidance" => match self.tool_propose_guidance(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "findings_for" => match self.tool_findings_for(arguments).await { + "promote_guidance" => match self.tool_promote_guidance(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "wardline_for" => match self.tool_wardline_for(arguments).await { + "entity_finding_list" => match self.tool_findings_for(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_by_tag" => match self.tool_find_by_tag(arguments).await { + "entity_wardline_get" => match self.tool_wardline_for(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_by_kind" => match self.tool_find_by_kind(arguments).await { + "entity_tag_list" => match self.tool_find_by_tag(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_by_wardline" => match self.tool_find_by_wardline(arguments).await { + "entity_kind_list" => match self.tool_find_by_kind(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_circular_imports" => match self.tool_find_circular_imports(arguments).await { + "entity_wardline_list" => match self.tool_find_by_wardline(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_coupling_hotspots" => match self.tool_find_coupling_hotspots(arguments).await { + "module_circular_import_list" => match self.tool_find_circular_imports(arguments).await + { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_entry_points" => match self.tool_find_entry_points(arguments).await { + "entity_coupling_hotspot_list" => { + match self.tool_find_coupling_hotspots(arguments).await { + Ok(value) => value, + Err(response) => return response.to_json_rpc(id), + } + } + "entity_entry_point_list" => match self.tool_find_entry_points(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_http_routes" => match self.tool_find_http_routes(arguments).await { + "entity_http_route_list" => match self.tool_find_http_routes(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_data_models" => match self.tool_find_data_models(arguments).await { + "entity_data_model_list" => match self.tool_find_data_models(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_tests" => match self.tool_find_tests(arguments).await { + "entity_test_list" => match self.tool_find_tests(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_deprecations" => match self.tool_find_deprecations(arguments).await { + "entity_deprecation_list" => match self.tool_find_deprecations(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_todos" => match self.tool_find_todos(arguments).await { + "entity_todo_list" => match self.tool_find_todos(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "what_tests_this" => match self.tool_what_tests_this(arguments).await { + "entity_test_caller_list" => match self.tool_what_tests_this(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "high_churn" => match self.tool_high_churn(arguments).await { + "entity_high_churn_list" => match self.tool_high_churn(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "recently_changed" => match self.tool_recently_changed(arguments).await { + "entity_recent_change_list" => match self.tool_recently_changed(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "find_dead_code" => match self.tool_find_dead_code(arguments).await { + "entity_dead_list" => match self.tool_find_dead_code(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, - "search_semantic" => match self.tool_search_semantic(arguments).await { + "entity_semantic_search_list" => match self.tool_search_semantic(arguments).await { Ok(value) => value, Err(response) => return response.to_json_rpc(id), }, @@ -965,18 +1071,265 @@ impl ServerState { } } } + + async fn tool_propose_guidance( + &self, + arguments: &serde_json::Map, + ) -> std::result::Result { + let entity_id = required_str(arguments, "entity_id")?.to_owned(); + let content = required_str(arguments, "content")?.to_owned(); + let scope_level = arguments + .get("scope_level") + .and_then(Value::as_str) + .unwrap_or("function") + .to_owned(); + let pinned = optional_bool(arguments, "pinned")?.unwrap_or(false); + let name = arguments + .get("name") + .and_then(Value::as_str) + .map(ToOwned::to_owned); + let expires = arguments + .get("expires") + .and_then(Value::as_str) + .map(ToOwned::to_owned); + let match_rules = arguments + .get("match_rules") + .and_then(Value::as_array) + .cloned() + .unwrap_or_else(|| vec![json!({"type": "entity", "id": entity_id})]); + + let project_root = self.project_root.clone(); + let entity_lookup_id = entity_id.clone(); + let entity = match self + .readers + .with_reader(move |conn| entity_by_id(conn, &entity_lookup_id)) + .await + { + Ok(Some(entity)) => entity, + Ok(None) => { + return Ok(tool_error_envelope( + McpErrorCode::EntityNotFound, + &format!("entity {entity_id} was not found"), + false, + )); + } + Err(err) => { + return Ok(tool_error_envelope( + McpErrorCode::StorageError, + &format!("read entity for guidance proposal: {err}"), + storage_retryable(&err), + )); + } + }; + + let proposal = GuidanceProposal { + entity_id: entity_id.clone(), + content, + scope_level, + match_rules, + name, + pinned, + expires, + }; + let detail = match proposal.to_observation_detail() { + Ok(detail) => detail, + Err(err) => { + return Ok(tool_error_envelope( + McpErrorCode::StorageError, + &format!("build guidance proposal: {err}"), + false, + )); + } + }; + + let Some(client) = self.filigree_client.clone() else { + return Ok(tool_error_envelope( + McpErrorCode::IoError, + "Filigree integration is not configured; cannot create guidance proposal observation", + true, + )); + }; + let file_path = entity.source_file_path.as_deref().map(|path| { + std::path::Path::new(path) + .strip_prefix(&project_root) + .ok() + .and_then(|rel| rel.to_str()) + .unwrap_or(path) + .to_owned() + }); + let request = ObservationCreateRequest { + summary: format!("Clarion guidance proposal for {entity_id}"), + detail, + file_path, + line: entity.source_line_start, + priority: 2, + actor: "clarion".to_owned(), + }; + + let response = + match tokio::task::spawn_blocking(move || client.create_observation(request)).await { + Ok(Ok(response)) => response, + Ok(Err(err)) => { + return Ok(tool_error_envelope( + McpErrorCode::IoError, + &format!("create Filigree guidance proposal observation: {err}"), + true, + )); + } + Err(err) => { + return Ok(tool_error_envelope( + McpErrorCode::Internal, + &format!("create Filigree guidance proposal task failed: {err}"), + true, + )); + } + }; + + Ok(success_envelope(json!({ + "observation_id": response.observation_id, + "promoted": false, + }))) + } + + async fn tool_promote_guidance( + &self, + arguments: &serde_json::Map, + ) -> std::result::Result { + let observation_id = required_str(arguments, "observation_id")?.to_owned(); + let Some(client) = self.filigree_client.clone() else { + return Ok(tool_error_envelope( + McpErrorCode::IoError, + "Filigree integration is not configured; cannot read guidance proposal observation", + true, + )); + }; + let lookup_client = client.clone(); + let lookup_id = observation_id.clone(); + let observation = + match tokio::task::spawn_blocking(move || lookup_client.observation_by_id(&lookup_id)) + .await + { + Ok(Ok(Some(observation))) => observation, + Ok(Ok(None)) => { + return Ok(tool_error_envelope( + McpErrorCode::NotFound, + &format!("observation {observation_id} was not found"), + false, + )); + } + Ok(Err(err)) => { + return Ok(tool_error_envelope( + McpErrorCode::IoError, + &format!("read Filigree observation {observation_id}: {err}"), + true, + )); + } + Err(err) => { + return Ok(tool_error_envelope( + McpErrorCode::Internal, + &format!("read Filigree observation task failed: {err}"), + true, + )); + } + }; + + let proposal = match GuidanceProposal::from_observation_detail(&observation.detail) { + Ok(proposal) => proposal, + Err(err) => { + return Ok(tool_error_envelope( + McpErrorCode::StorageError, + &format!("observation {observation_id} is not a guidance proposal: {err}"), + false, + )); + } + }; + let authored_at = guidance_authored_at_from_clock(&(self.clock)()); + let promoted = match proposal.to_promoted_sheet(&authored_at) { + Ok(promoted) => promoted, + Err(err) => { + return Ok(tool_error_envelope( + McpErrorCode::StorageError, + &format!("build promoted guidance sheet: {err}"), + false, + )); + } + }; + + let db_path = self.project_root.join(".clarion").join("clarion.db"); + let project_root = self.project_root.clone(); + let sheet_id = promoted.id.clone(); + let write_result = + tokio::task::spawn_blocking(move || -> std::result::Result { + let conn = + open_guidance_write_connection(&db_path).map_err(|err| err.to_string())?; + upsert_guidance_sheet( + &conn, + &GuidanceSheetInput { + id: &promoted.id, + name: &promoted.name, + short_name: &promoted.short_name, + properties: &promoted.properties, + }, + ) + .map_err(|err| err.to_string())?; + let Some(sheet) = clarion_storage::get_guidance_sheet(&conn, &promoted.id) + .map_err(|err| err.to_string())? + else { + return Ok(0); + }; + invalidate_summaries_for_sheet(&conn, &sheet, &project_root) + .map_err(|err| err.to_string()) + }) + .await; + let invalidated = match write_result { + Ok(Ok(invalidated)) => invalidated, + Ok(Err(err)) => { + return Ok(tool_error_envelope( + McpErrorCode::StorageError, + &format!("write promoted guidance sheet {sheet_id}: {err}"), + true, + )); + } + Err(err) => { + return Ok(tool_error_envelope( + McpErrorCode::Internal, + &format!("write promoted guidance sheet task failed: {err}"), + true, + )); + } + }; + + let dismiss_id = observation_id.clone(); + let dismissed = tokio::task::spawn_blocking(move || { + client.dismiss_observation(&dismiss_id, "promoted to Clarion guidance sheet") + }) + .await + .is_ok_and(|result| result.is_ok()); + + Ok(success_envelope(json!({ + "observation_id": observation_id, + "sheet_id": sheet_id, + "invalidated_summaries": invalidated, + "observation_dismissed": dismissed, + }))) + } } async fn invoke_llm_provider( provider: Arc, request: LlmRequest, ) -> Result { - tokio::task::spawn_blocking(move || provider.invoke(request)) - .await - .map_err(|err| LlmProviderError::InvalidResponse { - message: format!("LLM provider task failed: {err}"), - retryable: true, - })? + provider.invoke(request).await +} + +fn open_guidance_write_connection(path: &std::path::Path) -> rusqlite::Result { + let conn = Connection::open_with_flags( + path, + OpenFlags::SQLITE_OPEN_READ_WRITE | OpenFlags::SQLITE_OPEN_URI, + )?; + conn.busy_timeout(std::time::Duration::from_secs(5))?; + conn.pragma_update(None, "foreign_keys", "ON")?; + Ok(conn) } struct SummaryLlmState { @@ -1058,6 +1411,7 @@ struct SummaryReady { entity_json: Value, key: SummaryCacheKey, cached: Option, + guidance_text: String, caller_count: i64, fan_out: i64, } @@ -1957,7 +2311,7 @@ fn plugin_entity_counts(conn: &rusqlite::Connection) -> Value { /// `completed` one). Fail-soft: no rows or a query error → `null`. fn latest_run_row(conn: &rusqlite::Connection) -> Value { match conn.query_row( - "SELECT id, status, started_at, completed_at FROM runs \ + "SELECT id, status, started_at, completed_at, owner_pid, heartbeat_at FROM runs \ ORDER BY started_at DESC LIMIT 1", [], |row| { @@ -1966,6 +2320,8 @@ fn latest_run_row(conn: &rusqlite::Connection) -> Value { "status": row.get::<_, String>(1)?, "started_at": row.get::<_, String>(2)?, "completed_at": row.get::<_, Option>(3)?, + "owner_pid": row.get::<_, Option>(4)?, + "heartbeat_at": row.get::<_, Option>(5)?, })) }, ) { @@ -2699,12 +3055,12 @@ fn orientation_suggested_reads( }; let mut reads = vec![ json!({ - "tool": "source_for_entity", + "tool": "entity_source_get", "args": {"id": primary_id}, "why": "read the entity's source with line numbers", }), json!({ - "tool": "summary_preview_cost", + "tool": "entity_summary_preview_cost_get", "args": {"id": primary_id}, "why": "estimate the cost of an LLM briefing before spending", }), @@ -2713,13 +3069,13 @@ fn orientation_suggested_reads( // the owning subsystem. if primary_kind == Some("subsystem") { reads.push(json!({ - "tool": "subsystem_members", + "tool": "subsystem_member_list", "args": {"id": primary_id}, "why": "list the entities clustered into this subsystem", })); } else { reads.push(json!({ - "tool": "subsystem_of", + "tool": "entity_subsystem_get", "args": {"id": primary_id}, "why": "see which subsystem this entity belongs to", })); @@ -2735,7 +3091,7 @@ fn orientation_suggested_reads( .and_then(Value::as_str) { reads.push(json!({ - "tool": "orientation_pack", + "tool": "entity_orientation_pack_get", "args": {"entity": callee_id}, "why": "orient on the primary callee", })); @@ -2776,7 +3132,7 @@ fn read_progress_snapshot(path: &std::path::Path) -> Option { /// Parse a timestamp to Unix seconds, accepting both the MCP clock's /// `unix:` form and the RFC3339 form analyze writes into the progress /// file's `heartbeat_at`. `None` if neither parses. -fn parse_to_unix_seconds(value: &str) -> Option { +pub(crate) fn parse_to_unix_seconds(value: &str) -> Option { use time::OffsetDateTime; use time::format_description::well_known::Rfc3339; if let Some(rest) = value.strip_prefix("unix:") { @@ -3524,6 +3880,26 @@ fn default_now_string() -> String { format!("unix:{seconds}") } +fn guidance_authored_at_from_clock(raw: &str) -> String { + const ISO_MILLIS_UTC: &[time::format_description::FormatItem<'_>] = + format_description!("[year]-[month]-[day]T[hour]:[minute]:[second].[subsecond digits:3]Z"); + + let parsed = if let Some(seconds) = raw.strip_prefix("unix:") { + seconds + .trim() + .parse::() + .ok() + .and_then(|seconds| OffsetDateTime::from_unix_timestamp(seconds).ok()) + } else { + parse_to_unix_seconds(raw) + .and_then(|seconds| OffsetDateTime::from_unix_timestamp(seconds).ok()) + }; + + parsed + .and_then(|timestamp| timestamp.format(&ISO_MILLIS_UTC).ok()) + .unwrap_or_else(|| raw.to_owned()) +} + fn caller_json( conn: &rusqlite::Connection, edge: &CallEdgeMatch, @@ -3766,7 +4142,7 @@ fn error_response(id: &Value, code: i64, message: &str) -> Value { #[cfg(test)] mod tests { - use std::sync::{Arc, Mutex}; + use std::sync::Arc; use std::time::Duration; use clarion_core::{CachingModel, LlmProvider, LlmProviderError, LlmRequest, LlmResponse}; @@ -3782,73 +4158,73 @@ mod tests { fn tools_list_exposes_exact_docstrings() { let tools = list_tools(); - assert_eq!(tools.len(), 37); + assert_eq!(tools.len(), 39); assert_eq!(tools[0].name, "entity_at"); assert_eq!( tools[0].description, "Return the innermost Clarion entity whose source range contains a file and line, plus an `entity_context` evidence block: match_reason (decorator_range / declaration / body_range / containing_range / no_match) explaining why the line matched, the module→entity containing stack, the matched entity's decl/body/decorator sub-ranges, any same-granularity ambiguity alternatives, and index freshness. Paths are normalized relative to the project root. A blank or comment line that only a module spans reports containing_range — never a fabricated exact match." ); - assert_eq!(tools[1].name, "find_entity"); + assert_eq!(tools[1].name, "entity_find"); assert_eq!( tools[1].description, "Search Clarion entities by id, name, short name, and summary text stored on entity rows. Results are paginated and ranked by FTS match where possible. This does not traverse the graph and does not search on-demand summary_cache entries. Pass an optional `kind` (e.g. \"subsystem\", \"function\", \"class\", \"module\") to return only entities of that kind — the way to locate a subsystem without visually filtering results." ); - assert_eq!(tools[2].name, "callers_of"); + assert_eq!(tools[2].name, "entity_callers_list"); assert_eq!( tools[2].description, "Return entities that call the given entity. Default confidence is resolved, so ambiguous static candidates and LLM-inferred edges are excluded unless explicitly requested. Ambiguous edges expand all candidates; inferred edges may trigger bounded LLM dispatch. The result carries scope_excludes naming static blind spots not searched (e.g. attribute-receiver-calls) so an empty callers list is never read as a guaranteed true negative." ); - assert_eq!(tools[3].name, "execution_paths_from"); + assert_eq!(tools[3].name, "entity_execution_path_list"); assert_eq!( tools[3].description, "Return bounded calls-only execution paths starting at an entity. Default confidence is resolved. max_depth defaults to 3. Results are compact: a deduplicated nodes table plus paths as arrays of node ids (under a root), ranked longest-first. Traversal stops at the server edge cap and the response is capped at a maximum number of ranked paths; truncated/truncation_reason report edge-cap or path-cap when either trims. The result carries scope_excludes naming static blind spots not searched (e.g. attribute-receiver-calls)." ); - assert_eq!(tools[4].name, "summary"); + assert_eq!(tools[4].name, "entity_summary_get"); assert_eq!( tools[4].description, "Return an on-demand cached summary for one entity. In v0.1 this is leaf scope only: module summaries describe the module docstring and top-level members, not an aggregation of contained function/class summaries. If the LLM returns non-JSON the response degrades to a deterministic structural summary (kind: structural-fallback) built from the entity source, and that fallback is cached so a retry is a free cache hit rather than a re-billed failure." ); - assert_eq!(tools[5].name, "issues_for"); + assert_eq!(tools[5].name, "entity_issue_list"); assert_eq!( tools[5].description, "Return Filigree issues attached to this Clarion entity, optionally including issues attached to contained entities. Filigree is an enrichment source; if unavailable, the tool returns an unavailable envelope instead of failing Clarion. The result carries a result_kind (matched | no_matches | unavailable) so a reachable-but-empty Filigree is distinct from an unreachable one, and a filigree_endpoint block (configured vs resolved URL + resolution_source) so you can see which endpoint — e.g. a live ethereal port — the answer came from. Each matched/drifted entry carries an `issue` object with the issue's title, status, and priority (fetched once per distinct issue, no N+1); `issue` is null when the issue-detail route is unavailable, so the match still resolves without a second hop into Filigree. Includes a `wardline_findings` section (enrich-only) reconciling Wardline findings to the entity by qualname; `result_kind` is matched|no_matches|unavailable." ); - assert_eq!(tools[6].name, "neighborhood"); + assert_eq!(tools[6].name, "entity_neighborhood_get"); assert_eq!( tools[6].description, "Return the one-hop Clarion neighborhood around an entity: callers, callees, container, contained entities, references, and imports (imports_in = who imports this module, imports_out = what it imports; module-to-module). Default confidence is resolved; ambiguous and inferred calls are opt-in. References and imports are not execution flow. When the entity is a module, references_in/references_out are rolled up over the symbols it contains (references_rolled_up=true) — each neighbor carries a `via` naming the contained symbol the edge touches, so \"who imports this module/contract\" is answered at module altitude rather than reading empty. On references_in each rolled-up neighbor also carries `importer_module` — the importing symbol's containing module — so reverse-import names importing modules, not just symbols. The result carries scope_excludes naming blind spots not searched (e.g. attribute-receiver-calls) so empty sections are never read as guaranteed true negatives." ); - assert_eq!(tools[7].name, "subsystem_members"); + assert_eq!(tools[7].name, "subsystem_member_list"); assert_eq!( tools[7].description, "List module entities assigned to a subsystem entity." ); - assert_eq!(tools[8].name, "subsystem_of"); + assert_eq!(tools[8].name, "entity_subsystem_get"); assert_eq!( tools[8].description, "Return the subsystem an entity belongs to — the reverse of subsystem_members. Accepts any entity id: a module resolves directly, while a function/class resolves through its nearest containing module. Returns the subsystem id/name and the module the membership was resolved through, or a no-subsystem result when the entity has no subsystem-assigned module ancestor." ); - assert_eq!(tools[9].name, "project_status"); + assert_eq!(tools[9].name, "project_status_get"); assert_eq!( tools[9].description, "Return deterministic Clarion diagnostics: repo root, db path, latest run (id/status/started/completed), entity/subsystem/edge/finding/briefing-blocked counts, index staleness, per-plugin entity counts from the current index, LLM policy (provider/live/cache), and the resolved Filigree endpoint (configured vs resolved URL + resolution source). Answers \"is the graph fresh, plugin-less, LLM-live, Filigree-reachable?\" without shelling out. No LLM call." ); - assert_eq!(tools[10].name, "summary_preview_cost"); + assert_eq!(tools[10].name, "entity_summary_preview_cost_get"); assert_eq!( tools[10].description, "Preview what calling summary(id) would cost BEFORE spending. Reports cache_status (hit | expired | miss), the cached row's real tokens/cost/age on a hit, an input-token estimate on a miss, the configured model, the LLM policy (provider/live/allow_live_provider/cache horizon), and live_spend_would_occur — true only when no fresh cache row exists AND a live provider is wired. A disabled/unconfigured LLM is reported distinctly from a cache miss. Never invokes the LLM provider." ); - assert_eq!(tools[11].name, "source_for_entity"); + assert_eq!(tools[11].name, "entity_source_get"); assert_eq!( tools[11].description, "Return the exact indexed source span for one entity (its source_line_start..source_line_end, which includes any decorators/signature/docstring the plugin captured) plus a bounded window of surrounding context, as line-numbered lines each flagged in_entity true/false. No LLM call. Lets an agent read and trust the entity without shelling out. source_status reports `ok`, or — instead of a misleading stale snippet — `missing` (file gone), `no_range`/`no_source_path` (entity has no anchor), `binary` (non-UTF-8), or `drifted` (the file no longer matches the indexed content_hash; rerun `clarion analyze`). context_lines defaults to 10." ); - assert_eq!(tools[12].name, "call_sites"); + assert_eq!(tools[12].name, "entity_call_site_list"); assert_eq!( tools[12].description, "Show the actual source sites behind calls/references edges, so an agent can see WHY Clarion believes an edge exists rather than trusting it blind. role=caller (default) returns this entity's outgoing sites (what it calls/references); role=callee returns incoming sites (who calls/references it). Each site carries the file path, 1-based line, byte column, the source line text, edge kind, confidence, and a resolution of resolved | ambiguous (with candidate ids) | unresolved (a static call Clarion could not bind, kept separate so it is never mixed with resolved evidence). Filter by edge kind (`calls`/`references`) and by a best-effort production/test path heuristic (`all`/`production`/`test`; path partitioning is not indexed — the heuristic matches conventional test paths). Output is bounded; truncated flags when the site cap trims. No LLM call." ); - assert_eq!(tools[13].name, "orientation_pack"); + assert_eq!(tools[13].name, "entity_orientation_pack_get"); assert_eq!( tools[13].description, "Assemble one deterministic orientation packet for a code location — the replacement for hand-composing find_entity + entity_at + source reads + neighborhood + issues_for + freshness on every question. Resolve EITHER by `entity` id OR by `file`+`line` (exactly one form). The packet bundles: the primary entity, the entity_context evidence (match_reason / containing stack / decl-body-decorator ranges — so a decorator-line query is explained, not guessed), a compact source-span summary, one-hop neighbors (callers, callees, container, contained, references, imports — for a module, references_in/out are rolled up over contained symbols with references_rolled_up=true), compact resolved execution paths, related Filigree issues, index/Filigree/LLM health, warnings, and suggested next reads. No LLM summary is invoked. Every list is bounded; an `omitted` block reports per-section truncation counts and `degraded` sections name surfaces that were unavailable (e.g. Filigree down) so an empty section is never read as a guaranteed negative. Includes a `wardline_findings` section (enrich-only) reconciling Wardline findings to the entity by qualname; `result_kind` is matched|no_matches|unavailable." @@ -3858,7 +4234,7 @@ mod tests { tools[14].description, "Start a `clarion analyze` run over this project in the background and return its run handle immediately — do not block on the (possibly many-minute) run. Re-indexes the source tree and refreshes entities/edges/subsystems. Returns run_id, status (`started`), and the progress-file path. Only one analyze may run per project at a time (a cross-process lock enforces it); a second start while one is active is rejected. Poll analyze_status for progress; analyze_cancel to stop. No arguments." ); - assert_eq!(tools[15].name, "analyze_status"); + assert_eq!(tools[15].name, "analyze_status_get"); assert_eq!( tools[15].description, "Report the live status of an analyze run started via analyze_start. status is one of queued (spawned, not yet recording) | running | completed | failed | cancelled | skipped_no_plugins. While running it exposes phase (discovering / analyzing / clustering), current_plugin, processed_files / total_files, current_file, the latest heartbeat_at, elapsed_seconds, and progress_observed (false when the heartbeat has gone stale — the run may be wedged). On a terminal status it carries the recorded run stats. Reads structured progress, never logs." @@ -3868,26 +4244,28 @@ mod tests { tools[16].description, "Cancel a running analyze. SIGKILLs the run's whole process group — terminating the language plugin and its pyright-langserver child — then marks the run terminal (status `cancelled`) so it is never left dangling as `running`. Idempotent: cancelling an already-terminal run reports its current state. Partial work already written is kept (cancel discards in-flight work, not the index)." ); - assert_eq!(tools[17].name, "index_diff"); - assert_eq!(tools[18].name, "guidance_for"); - assert_eq!(tools[19].name, "findings_for"); - assert_eq!(tools[20].name, "wardline_for"); - assert_eq!(tools[21].name, "find_by_tag"); - assert_eq!(tools[22].name, "find_by_kind"); - assert_eq!(tools[23].name, "find_by_wardline"); - assert_eq!(tools[24].name, "find_circular_imports"); - assert_eq!(tools[25].name, "find_coupling_hotspots"); - assert_eq!(tools[26].name, "find_entry_points"); - assert_eq!(tools[27].name, "find_http_routes"); - assert_eq!(tools[28].name, "find_data_models"); - assert_eq!(tools[29].name, "find_tests"); - assert_eq!(tools[30].name, "find_deprecations"); - assert_eq!(tools[31].name, "find_todos"); - assert_eq!(tools[32].name, "what_tests_this"); - assert_eq!(tools[33].name, "high_churn"); - assert_eq!(tools[34].name, "recently_changed"); - assert_eq!(tools[35].name, "find_dead_code"); - assert_eq!(tools[36].name, "search_semantic"); + assert_eq!(tools[17].name, "index_diff_get"); + assert_eq!(tools[18].name, "entity_guidance_list"); + assert_eq!(tools[19].name, "propose_guidance"); + assert_eq!(tools[20].name, "promote_guidance"); + assert_eq!(tools[21].name, "entity_finding_list"); + assert_eq!(tools[22].name, "entity_wardline_get"); + assert_eq!(tools[23].name, "entity_tag_list"); + assert_eq!(tools[24].name, "entity_kind_list"); + assert_eq!(tools[25].name, "entity_wardline_list"); + assert_eq!(tools[26].name, "module_circular_import_list"); + assert_eq!(tools[27].name, "entity_coupling_hotspot_list"); + assert_eq!(tools[28].name, "entity_entry_point_list"); + assert_eq!(tools[29].name, "entity_http_route_list"); + assert_eq!(tools[30].name, "entity_data_model_list"); + assert_eq!(tools[31].name, "entity_test_list"); + assert_eq!(tools[32].name, "entity_deprecation_list"); + assert_eq!(tools[33].name, "entity_todo_list"); + assert_eq!(tools[34].name, "entity_test_caller_list"); + assert_eq!(tools[35].name, "entity_high_churn_list"); + assert_eq!(tools[36].name, "entity_recent_change_list"); + assert_eq!(tools[37].name, "entity_dead_list"); + assert_eq!(tools[38].name, "entity_semantic_search_list"); } #[test] @@ -4190,9 +4568,19 @@ mod tests { assert_eq!(response["jsonrpc"], "2.0"); assert_eq!(response["id"], "tools-1"); - assert_eq!(response["result"]["tools"].as_array().unwrap().len(), 37); + let tools = response["result"]["tools"].as_array().unwrap(); + assert_eq!(tools.len(), 39); + let tool_names: Vec<&str> = tools + .iter() + .filter_map(|tool| tool.get("name").and_then(serde_json::Value::as_str)) + .collect(); + assert!(tool_names.contains(&"propose_guidance")); + assert!(tool_names.contains(&"promote_guidance")); assert_eq!(response["result"]["tools"][0]["name"], "entity_at"); - assert_eq!(response["result"]["tools"][7]["name"], "subsystem_members"); + assert_eq!( + response["result"]["tools"][7]["name"], + "subsystem_member_list" + ); } #[test] @@ -4291,7 +4679,14 @@ mod tests { assert_eq!(decoded["jsonrpc"], "2.0"); assert_eq!(decoded["id"], 10); - assert_eq!(decoded["result"]["tools"].as_array().unwrap().len(), 37); + let tools = decoded["result"]["tools"].as_array().unwrap(); + assert_eq!(tools.len(), 39); + let tool_names: Vec<&str> = tools + .iter() + .filter_map(|tool| tool.get("name").and_then(serde_json::Value::as_str)) + .collect(); + assert!(tool_names.contains(&"propose_guidance")); + assert!(tool_names.contains(&"promote_guidance")); } #[test] @@ -4366,7 +4761,14 @@ mod tests { assert_eq!(first_json["id"], 11); assert_eq!(first_json["result"]["serverInfo"]["name"], "clarion"); assert_eq!(second_json["id"], 12); - assert_eq!(second_json["result"]["tools"].as_array().unwrap().len(), 37); + let tools = second_json["result"]["tools"].as_array().unwrap(); + assert_eq!(tools.len(), 39); + let tool_names: Vec<&str> = tools + .iter() + .filter_map(|tool| tool.get("name").and_then(serde_json::Value::as_str)) + .collect(); + assert!(tool_names.contains(&"propose_guidance")); + assert!(tool_names.contains(&"promote_guidance")); } #[test] @@ -4560,12 +4962,12 @@ mod tests { let key = inferred_test_key(); let read = inferred_test_read(key.clone()); let (writer, _rx) = mpsc::channel(1); - let (release_tx, release_rx) = std::sync::mpsc::channel(); + let (release_tx, release_rx) = tokio::sync::mpsc::channel(1); let llm = InferenceLlmState { writer, config: LlmConfig::default(), provider: Arc::new(BlockingProvider { - release: Mutex::new(release_rx), + release: tokio::sync::Mutex::new(release_rx), }), }; @@ -4581,7 +4983,7 @@ mod tests { handle.abort(); let _ = handle.await; let removed = wait_until_inferred_inflight_removed(&state, &key).await; - let _ = release_tx.send(()); + let _ = release_tx.send(()).await; assert!( removed, @@ -4707,20 +5109,18 @@ mod tests { } struct BlockingProvider { - release: Mutex>, + release: tokio::sync::Mutex>, } + #[async_trait::async_trait] impl LlmProvider for BlockingProvider { fn name(&self) -> &'static str { "blocking" } - fn invoke(&self, _request: LlmRequest) -> Result { - let _ = self - .release - .lock() - .unwrap_or_else(std::sync::PoisonError::into_inner) - .recv(); + async fn invoke(&self, _request: LlmRequest) -> Result { + let mut rx = self.release.lock().await; + let _ = rx.recv().await; Ok(LlmResponse { model_id: "test-model".to_owned(), output_json: r#"{"edges":[]}"#.to_owned(), diff --git a/crates/clarion-mcp/src/scan_results.rs b/crates/clarion-mcp/src/scan_results.rs index 99361dd3..d77288b0 100644 --- a/crates/clarion-mcp/src/scan_results.rs +++ b/crates/clarion-mcp/src/scan_results.rs @@ -1,607 +1 @@ -//! Filigree-native scan-results emission (WP9-B, REQ-FINDING-03). -//! -//! Maps Clarion's persisted findings onto Filigree's `POST /api/v1/scan-results` -//! intake schema (ADR-004 + detailed-design §7) and models the response. This -//! module is pure — request building and response parsing only; the HTTP POST -//! lives on [`crate::filigree::FiligreeHttpClient::post_scan_results`]. -//! -//! Emission is enrich-only: a one-way Clarion→Filigree push that adds no -//! Filigree-side routes and never gates Clarion's own semantics. Clarion's -//! richer fields nest under `metadata.clarion.*` so Filigree's silent -//! top-level-key drop (verified against the live intake) cannot lose them. - -use serde::{Deserialize, Serialize}; -use serde_json::{Map, Value, json}; - -use clarion_storage::FindingForEmitRow; - -/// The `scan_source` Clarion stamps on every emitted finding. Filigree's dedup -/// key includes `scan_source`, so this is stable across runs. -pub const CLARION_SCAN_SOURCE: &str = "clarion"; - -/// Map Clarion's internal severity vocabulary (`INFO` | `WARN` | `ERROR` | -/// `CRITICAL` | `NONE`) to Filigree's wire vocabulary (detailed-design §7 -/// table). Anything unrecognised — including `NONE` (facts) and `INFO` — maps -/// to `info`, mirroring the coercion Filigree applies server-side, except done -/// here so the original survives in `metadata.clarion.internal_severity`. -/// -/// This mapping is load-bearing: a live probe confirmed Filigree coerces an -/// unmapped uppercase `WARN` to `info` (with a response warning), so emitting -/// the internal vocabulary verbatim would silently flatten every defect to -/// `info`. -#[must_use] -pub fn severity_to_wire(internal: &str) -> &'static str { - match internal { - "CRITICAL" => "critical", - "ERROR" => "high", - "WARN" => "medium", - _ => "info", - } -} - -/// Knobs the emitter sets per `clarion analyze` invocation. `create_observations` -/// is always `false` (Clarion emits findings, not observations). -#[derive(Debug, Clone)] -pub struct EmitOptions { - /// Filigree's `scan_run_id`; Clarion passes its `run_id` here. An unknown - /// id is tolerated by Filigree (it warns and proceeds), so this carries the - /// REQ-FINDING-05 wire shape without a pre-create handshake. - pub scan_run_id: Option, - /// `mark_unseen`: `true` for a normal full run so old-position findings for - /// the same rule/file transition to `unseen_in_latest` (REQ-FINDING-06). - pub mark_unseen: bool, - /// `complete_scan_run`: `true` on the final (here: only) batch. - pub complete_scan_run: bool, - /// Fallback `path` for findings whose anchor entity has no `source_file_path` - /// (synthetic, non-file entities — subsystems, project, guidance). Filigree - /// rejects path-less findings, so when this is set such a finding emits - /// against this stand-in path (the project root, mirroring the - /// `core:project:*` finding anchor) and carries - /// `metadata.clarion.synthetic_anchor=true` so a consumer knows the path is a - /// placeholder for a non-file entity, not the finding's real location. When - /// `None`, path-less findings are skipped (`skipped_no_path`) as before. - pub default_path: Option, -} - -/// The Filigree-native scan-results request body. Serializes to the exact wire -/// shape Filigree's intake accepts; any field outside its enumerated set is -/// silently dropped server-side, so the struct carries only known keys. -#[derive(Debug, Clone, PartialEq, Serialize)] -pub struct ScanResultsRequest { - pub scan_source: String, - #[serde(skip_serializing_if = "Option::is_none")] - pub scan_run_id: Option, - pub mark_unseen: bool, - pub create_observations: bool, - pub complete_scan_run: bool, - pub findings: Vec, -} - -/// A prepared batch plus the counts the emitter records in `stats.json`. -#[derive(Debug, Clone)] -pub struct PreparedBatch { - pub request: ScanResultsRequest, - /// Findings rendered into the request body. - pub emitted: usize, - /// Findings dropped because their anchor entity has no `source_file_path` - /// (Filigree requires `path`; emitting a synthetic one would pollute its - /// file registry). Surfaced so the skip is never silent. - pub skipped_no_path: usize, -} - -/// Build a scan-results batch from persisted findings. Findings whose anchor -/// entity has no source path are skipped and counted, not emitted. -#[must_use] -pub fn prepare_batch(rows: &[FindingForEmitRow], opts: &EmitOptions) -> PreparedBatch { - let mut findings = Vec::with_capacity(rows.len()); - let mut skipped_no_path = 0; - for row in rows { - match wire_finding(row, opts.default_path.as_deref()) { - Some(finding) => findings.push(finding), - None => skipped_no_path += 1, - } - } - let emitted = findings.len(); - PreparedBatch { - request: ScanResultsRequest { - scan_source: CLARION_SCAN_SOURCE.to_owned(), - scan_run_id: opts.scan_run_id.clone(), - mark_unseen: opts.mark_unseen, - create_observations: false, - complete_scan_run: opts.complete_scan_run, - findings, - }, - emitted, - skipped_no_path, - } -} - -/// Render one persisted finding as a Filigree-native wire finding, or `None` -/// when it has no usable `path` (Filigree rejects path-less findings with a -/// `400 VALIDATION`). -/// -/// `default_path` is the [`EmitOptions::default_path`] fallback: when the anchor -/// entity has no `source_file_path` (a synthetic, non-file entity) but a fallback -/// is supplied, the finding emits against it and is flagged -/// `metadata.clarion.synthetic_anchor=true`. A synthetic anchor never carries -/// line numbers (the placeholder path has no meaningful position). -fn wire_finding(row: &FindingForEmitRow, default_path: Option<&str>) -> Option { - let row_path = row - .source_file_path - .as_deref() - .map(str::trim) - .filter(|path| !path.is_empty()); - let (path, synthetic_anchor) = match row_path { - Some(path) => (path, false), - None => ( - default_path - .map(str::trim) - .filter(|path| !path.is_empty())?, - true, - ), - }; - let mut finding = Map::new(); - finding.insert("path".to_owned(), json!(path)); - finding.insert("rule_id".to_owned(), json!(row.rule_id)); - finding.insert("message".to_owned(), json!(row.message)); - finding.insert( - "severity".to_owned(), - json!(severity_to_wire(&row.severity)), - ); - // A synthetic-anchor finding (subsystem/project/guidance) has no real - // file position, so the placeholder path carries no line numbers. - if !synthetic_anchor { - if let Some(line_start) = row.source_line_start { - finding.insert("line_start".to_owned(), json!(line_start)); - } - if let Some(line_end) = row.source_line_end { - finding.insert("line_end".to_owned(), json!(line_end)); - } - } - finding.insert("metadata".to_owned(), wire_metadata(row, synthetic_anchor)); - Some(Value::Object(finding)) -} - -/// Nest Clarion's richer fields under `metadata` (top level) and -/// `metadata.clarion` (Clarion-owned slot), per ADR-004 + detailed-design §7. -fn wire_metadata(row: &FindingForEmitRow, synthetic_anchor: bool) -> Value { - let mut meta = Map::new(); - meta.insert("kind".to_owned(), json!(row.kind)); - if let Some(confidence) = row.confidence { - meta.insert("confidence".to_owned(), json!(confidence)); - } - if let Some(basis) = &row.confidence_basis { - meta.insert("confidence_basis".to_owned(), json!(basis)); - } - - let mut clarion = Map::new(); - clarion.insert("entity_id".to_owned(), json!(row.entity_id)); - clarion.insert( - "related_entities".to_owned(), - json_array_or_empty(&row.related_entities_json), - ); - clarion.insert( - "supports".to_owned(), - json_array_or_empty(&row.supports_json), - ); - clarion.insert( - "supported_by".to_owned(), - json_array_or_empty(&row.supported_by_json), - ); - // Lossless round-trip: the wire `severity` is the mapped value, so the - // internal vocabulary is preserved here for read-back. - clarion.insert("internal_severity".to_owned(), json!(row.severity)); - clarion.insert("internal_status".to_owned(), json!("open")); - // Flag the placeholder anchor so a consumer never mistakes the stand-in - // `path` (the project root) for the finding's real file location. - if synthetic_anchor { - clarion.insert("synthetic_anchor".to_owned(), json!(true)); - } - meta.insert("clarion".to_owned(), Value::Object(clarion)); - Value::Object(meta) -} - -/// Parse a stored JSON-array column; fall back to an empty array if the text is -/// malformed or not an array, so one bad row never derails a batch. -fn json_array_or_empty(raw: &str) -> Value { - match serde_json::from_str::(raw) { - Ok(value @ Value::Array(_)) => value, - _ => Value::Array(Vec::new()), - } -} - -/// Filigree's scan-results response. `#[serde(default)]` keeps the read -/// forward-compatible: Filigree may add fields without breaking Clarion. -#[derive(Debug, Clone, Default, PartialEq, Eq, Deserialize)] -#[serde(default)] -pub struct ScanResultsResponse { - pub files_created: u64, - pub files_updated: u64, - pub findings_created: u64, - pub findings_updated: u64, - pub observations_created: u64, - pub observations_failed: u64, - pub new_finding_ids: Vec, - /// Per-finding intake warnings (e.g. coerced severity, unknown - /// `scan_run_id`). REQ-FINDING-03 requires the emitter to parse these, not - /// just count them. - pub warnings: Vec, -} - -/// Parse a scan-results response body. -/// -/// # Errors -/// -/// Returns the underlying [`serde_json::Error`] if the body is not the expected -/// JSON object shape. -pub fn parse_scan_results_response(body: &str) -> Result { - serde_json::from_str(body) -} - -/// The scan-results intake URL for a Filigree base URL. -#[must_use] -pub fn scan_results_url(base_url: &str) -> String { - format!("{}/api/v1/scan-results", base_url.trim_end_matches('/')) -} - -/// The retention-sweep URL for a Filigree base URL (REQ-FINDING-06, -/// `--prune-unseen`). This is a **loom-generation** route (`/api/loom/…`), -/// unlike the classic `/api/v1/scan-results` emission intake — do not derive it -/// from [`scan_results_url`]. Verified against Filigree's own route handler and -/// API tests. -#[must_use] -pub fn clean_stale_url(base_url: &str) -> String { - format!( - "{}/api/loom/findings/clean-stale", - base_url.trim_end_matches('/') - ) -} - -/// The `POST /api/loom/findings/clean-stale` request body (REQ-FINDING-06). -/// Filigree **soft-archives** `unseen_in_latest` findings older than -/// `older_than_days`, scoped to `scan_source`, moving them to `fixed` status -/// (they auto-reopen if a later scan re-detects them — see Filigree ADR-015). -/// `scan_source` is required server-side as an accident-guard so a caller -/// cannot sweep every tool's findings; Clarion always sends `"clarion"`. -#[derive(Debug, Clone, PartialEq, Serialize)] -pub struct CleanStaleRequest { - pub scan_source: String, - pub older_than_days: u32, - pub actor: String, -} - -/// Filigree's clean-stale response. `#[serde(default)]` keeps the read tolerant -/// of added fields / missing keys so Filigree can grow the route. -#[derive(Debug, Clone, Default, PartialEq, Eq, Deserialize)] -#[serde(default)] -pub struct CleanStaleResponse { - pub findings_fixed: u64, - pub scan_source: String, - pub older_than_days: u64, -} - -/// Parse Filigree's clean-stale response body. -/// -/// # Errors -/// -/// Returns the underlying [`serde_json::Error`] if the body is not the expected -/// shape. -pub fn parse_clean_stale_response(body: &str) -> Result { - serde_json::from_str(body) -} - -#[cfg(test)] -mod tests { - use super::*; - - fn defect_row() -> FindingForEmitRow { - FindingForEmitRow { - id: "core:finding:run-1:circular".to_owned(), - rule_id: "CLA-PY-STRUCTURE-001".to_owned(), - kind: "defect".to_owned(), - severity: "WARN".to_owned(), - confidence: Some(0.95), - confidence_basis: Some("ast_match".to_owned()), - message: "Circular import detected".to_owned(), - entity_id: "python:class:auth.tokens::TokenManager".to_owned(), - related_entities_json: r#"["python:class:auth.sessions::SessionStore"]"#.to_owned(), - supports_json: "[]".to_owned(), - supported_by_json: "[]".to_owned(), - source_file_path: Some("src/auth/tokens.py".to_owned()), - source_line_start: Some(12), - source_line_end: Some(12), - } - } - - #[test] - fn severity_table_matches_detailed_design() { - assert_eq!(severity_to_wire("CRITICAL"), "critical"); - assert_eq!(severity_to_wire("ERROR"), "high"); - assert_eq!(severity_to_wire("WARN"), "medium"); - assert_eq!(severity_to_wire("INFO"), "info"); - assert_eq!(severity_to_wire("NONE"), "info"); - // Unknown values coerce to info, the same as Filigree's server-side rule. - assert_eq!(severity_to_wire("bogus"), "info"); - } - - #[test] - fn wire_finding_carries_mapped_severity_and_nested_clarion_metadata() { - let finding = wire_finding(&defect_row(), None).expect("path present"); - - assert_eq!(finding["path"], json!("src/auth/tokens.py")); - assert_eq!(finding["rule_id"], json!("CLA-PY-STRUCTURE-001")); - assert_eq!(finding["message"], json!("Circular import detected")); - // Internal WARN maps to wire medium... - assert_eq!(finding["severity"], json!("medium")); - assert_eq!(finding["line_start"], json!(12)); - assert_eq!(finding["line_end"], json!(12)); - - let meta = &finding["metadata"]; - assert_eq!(meta["kind"], json!("defect")); - assert_eq!(meta["confidence"], json!(0.95)); - assert_eq!(meta["confidence_basis"], json!("ast_match")); - - let clarion = &meta["clarion"]; - assert_eq!( - clarion["entity_id"], - json!("python:class:auth.tokens::TokenManager") - ); - assert_eq!( - clarion["related_entities"], - json!(["python:class:auth.sessions::SessionStore"]) - ); - assert_eq!(clarion["supports"], json!([])); - assert_eq!(clarion["supported_by"], json!([])); - // ...while the internal value round-trips under clarion.*. - assert_eq!(clarion["internal_severity"], json!("WARN")); - assert_eq!(clarion["internal_status"], json!("open")); - } - - #[test] - fn fact_finding_omits_confidence_basis_when_absent() { - let mut row = defect_row(); - row.kind = "fact".to_owned(); - row.severity = "NONE".to_owned(); - row.confidence = None; - row.confidence_basis = None; - - let finding = wire_finding(&row, None).expect("path present"); - assert_eq!(finding["severity"], json!("info")); - let meta = &finding["metadata"]; - assert_eq!(meta["kind"], json!("fact")); - assert!( - meta.get("confidence").is_none(), - "confidence omitted: {meta}" - ); - assert!( - meta.get("confidence_basis").is_none(), - "confidence_basis omitted: {meta}" - ); - assert_eq!(meta["clarion"]["internal_severity"], json!("NONE")); - } - - #[test] - fn path_less_finding_is_skipped_not_emitted() { - let mut row = defect_row(); - row.source_file_path = None; - assert!(wire_finding(&row, None).is_none()); - - let mut blank = defect_row(); - blank.source_file_path = Some(" ".to_owned()); - assert!( - wire_finding(&blank, None).is_none(), - "blank path is skipped too" - ); - } - - #[test] - fn path_less_finding_uses_default_path_and_flags_synthetic_anchor() { - // A subsystem-anchored finding (no source_file_path) emits against the - // supplied fallback path and is flagged as a synthetic anchor, with no - // line numbers (the placeholder path has no real position). - let mut row = defect_row(); - row.entity_id = "core:subsystem:abcd".to_owned(); - row.source_file_path = None; - let finding = wire_finding(&row, Some("/repo/root")).expect("emits via default path"); - assert_eq!(finding["path"], json!("/repo/root")); - assert_eq!( - finding["metadata"]["clarion"]["synthetic_anchor"], - json!(true) - ); - assert!( - finding.get("line_start").is_none() && finding.get("line_end").is_none(), - "synthetic anchor carries no line position: {finding}" - ); - - // A path-bearing finding ignores the fallback and is not flagged. - let finding = wire_finding(&defect_row(), Some("/repo/root")).expect("path present"); - assert_eq!(finding["path"], json!("src/auth/tokens.py")); - assert!( - finding["metadata"]["clarion"] - .get("synthetic_anchor") - .is_none(), - "real-path finding is not a synthetic anchor: {finding}" - ); - - // A blank fallback is no better than none: still skipped. - let mut row = defect_row(); - row.source_file_path = None; - assert!(wire_finding(&row, Some(" ")).is_none()); - } - - #[test] - fn malformed_related_entities_falls_back_to_empty_array() { - let mut row = defect_row(); - row.related_entities_json = "not json".to_owned(); - let finding = wire_finding(&row, None).expect("path present"); - assert_eq!( - finding["metadata"]["clarion"]["related_entities"], - json!([]) - ); - } - - #[test] - fn prepare_batch_counts_emitted_and_skipped() { - let emitted = defect_row(); - let mut skipped = defect_row(); - skipped.id = "core:finding:run-1:weak-modularity".to_owned(); - skipped.entity_id = "core:subsystem:abcd".to_owned(); - skipped.source_file_path = None; - - let batch = prepare_batch( - &[emitted, skipped], - &EmitOptions { - scan_run_id: Some("run-1".to_owned()), - mark_unseen: true, - complete_scan_run: true, - default_path: None, - }, - ); - - assert_eq!(batch.emitted, 1); - assert_eq!(batch.skipped_no_path, 1); - assert_eq!(batch.request.findings.len(), 1); - assert_eq!(batch.request.scan_source, "clarion"); - assert_eq!(batch.request.scan_run_id.as_deref(), Some("run-1")); - assert!(batch.request.mark_unseen); - assert!(batch.request.complete_scan_run); - assert!(!batch.request.create_observations); - } - - #[test] - fn request_serializes_to_filigree_wire_shape() { - let batch = prepare_batch( - &[defect_row()], - &EmitOptions { - scan_run_id: Some("run-1".to_owned()), - mark_unseen: true, - complete_scan_run: true, - default_path: None, - }, - ); - let value = serde_json::to_value(&batch.request).expect("serialize request"); - - assert_eq!(value["scan_source"], json!("clarion")); - assert_eq!(value["scan_run_id"], json!("run-1")); - assert_eq!(value["mark_unseen"], json!(true)); - assert_eq!(value["create_observations"], json!(false)); - assert_eq!(value["complete_scan_run"], json!(true)); - assert_eq!( - value["findings"].as_array().expect("findings array").len(), - 1 - ); - } - - #[test] - fn omitted_scan_run_id_is_absent_from_wire() { - let batch = prepare_batch( - &[defect_row()], - &EmitOptions { - scan_run_id: None, - mark_unseen: true, - complete_scan_run: true, - default_path: None, - }, - ); - let value = serde_json::to_value(&batch.request).expect("serialize request"); - assert!( - value.get("scan_run_id").is_none(), - "scan_run_id omitted when None: {value}" - ); - } - - #[test] - fn parses_live_response_shape() { - // Pinned to the real Filigree response captured from a live probe POST. - let response = parse_scan_results_response( - r#"{ - "files_created": 1, - "files_updated": 0, - "findings_created": 1, - "findings_updated": 0, - "new_finding_ids": ["clarion-sf-2f4cf9ca1b"], - "observations_created": 0, - "observations_failed": 0, - "warnings": ["Unknown severity 'WARN' for finding at probe/sev.py, mapped to 'info'"] - }"#, - ) - .expect("parse live response shape"); - - assert_eq!(response.findings_created, 1); - assert_eq!(response.files_created, 1); - assert_eq!(response.new_finding_ids, vec!["clarion-sf-2f4cf9ca1b"]); - assert_eq!(response.warnings.len(), 1); - assert!(response.warnings[0].contains("Unknown severity")); - } - - #[test] - fn response_parse_tolerates_missing_and_extra_fields() { - // Forward-compat: unknown fields ignored, missing fields default. - let response = parse_scan_results_response( - r#"{"findings_created": 2, "warnings": [], "some_future_field": 99}"#, - ) - .expect("parse forward-compatible response"); - assert_eq!(response.findings_created, 2); - assert!(response.warnings.is_empty()); - assert!(response.new_finding_ids.is_empty()); - } - - #[test] - fn builds_scan_results_url() { - assert_eq!( - scan_results_url("http://127.0.0.1:8542/"), - "http://127.0.0.1:8542/api/v1/scan-results" - ); - assert_eq!( - scan_results_url("http://127.0.0.1:8542"), - "http://127.0.0.1:8542/api/v1/scan-results" - ); - } - - #[test] - fn clean_stale_url_targets_the_loom_route() { - // Prune is a loom-generation route, distinct from the classic - // /api/v1 emission intake. - assert_eq!( - clean_stale_url("http://127.0.0.1:8542/"), - "http://127.0.0.1:8542/api/loom/findings/clean-stale" - ); - assert_eq!( - clean_stale_url("http://127.0.0.1:8542"), - "http://127.0.0.1:8542/api/loom/findings/clean-stale" - ); - } - - #[test] - fn clean_stale_request_serializes_to_filigree_wire_shape() { - let request = CleanStaleRequest { - scan_source: CLARION_SCAN_SOURCE.to_owned(), - older_than_days: 30, - actor: "clarion-mcp".to_owned(), - }; - let value = serde_json::to_value(&request).expect("serialize clean-stale request"); - assert_eq!(value["scan_source"], json!("clarion")); - assert_eq!(value["older_than_days"], json!(30)); - assert_eq!(value["actor"], json!("clarion-mcp")); - } - - #[test] - fn parses_clean_stale_response_shape() { - // Pinned to Filigree's clean-stale handler response. - let response = parse_clean_stale_response( - r#"{"findings_fixed": 4, "scan_source": "clarion", "older_than_days": 30}"#, - ) - .expect("parse clean-stale response"); - assert_eq!(response.findings_fixed, 4); - assert_eq!(response.scan_source, "clarion"); - assert_eq!(response.older_than_days, 30); - } - - #[test] - fn clean_stale_response_tolerates_missing_and_extra_fields() { - let response = parse_clean_stale_response(r#"{"findings_fixed": 1, "future_field": true}"#) - .expect("parse forward-compatible clean-stale response"); - assert_eq!(response.findings_fixed, 1); - assert_eq!(response.older_than_days, 0); - } -} +pub use clarion_federation::scan_results::*; diff --git a/crates/clarion-mcp/src/tools/analyze.rs b/crates/clarion-mcp/src/tools/analyze.rs index 31c8794a..6f25f513 100644 --- a/crates/clarion-mcp/src/tools/analyze.rs +++ b/crates/clarion-mcp/src/tools/analyze.rs @@ -258,6 +258,7 @@ impl ServerState { let run_id = run_id.to_owned(); self.readers .with_reader(move |conn| { + clarion_storage::mark_stale_running_runs_failed(conn)?; match conn.query_row( "SELECT status, stats FROM runs WHERE id = ?1", rusqlite::params![run_id], diff --git a/crates/clarion-mcp/src/tools/status.rs b/crates/clarion-mcp/src/tools/status.rs index deaf2d19..6b279801 100644 --- a/crates/clarion-mcp/src/tools/status.rs +++ b/crates/clarion-mcp/src/tools/status.rs @@ -63,7 +63,7 @@ impl ServerState { let entity_id = required_str(arguments, "id")?.to_owned(); let now = (self.clock)(); let read = match self - .read_summary_inputs(entity_id, self.summary_model_id()) + .read_summary_inputs(entity_id, self.summary_model_id(), now.clone()) .await { Ok(read) => read, @@ -138,6 +138,7 @@ impl ServerState { entity_id: ready.entity.id.clone(), kind: ready.entity.kind.clone(), name: ready.entity.name.clone(), + guidance: ready.guidance_text.clone(), source_excerpt, }); estimate_tokens_from_chars(&prompt.body) @@ -180,6 +181,21 @@ impl ServerState { let storage = self .readers .with_reader(move |conn| { + match clarion_storage::mark_stale_running_runs_failed(conn) { + Ok(repaired) if repaired > 0 => { + tracing::warn!( + repaired, + "project_status marked stale running analyze runs failed" + ); + } + Ok(_) => {} + Err(err) => { + tracing::warn!( + error = %err, + "project_status stale-run reconciliation failed; continuing" + ); + } + } let snapshot = crate::snapshot::project_snapshot(conn, &project_root); let edge_count = scalar_count_fail_soft(conn, "SELECT COUNT(*) FROM edges"); // Entities withheld from briefings/federation exposure (secret diff --git a/crates/clarion-mcp/src/tools/summary.rs b/crates/clarion-mcp/src/tools/summary.rs index 67d64e2b..15a92b06 100644 --- a/crates/clarion-mcp/src/tools/summary.rs +++ b/crates/clarion-mcp/src/tools/summary.rs @@ -5,6 +5,7 @@ //! shared free-function helpers, the tool catalogue, and the JSON-RPC dispatch. use std::collections::HashSet; +use std::path::Path; use std::sync::Arc; use clarion_core::{ @@ -16,10 +17,11 @@ use serde_json::{Value, json}; use tokio::sync::{broadcast, mpsc, oneshot}; use clarion_storage::{ - InferredCallEdgeRecord, InferredEdgeCacheEntry, InferredEdgeCacheKey, StorageError, + EntityRow, InferredCallEdgeRecord, InferredEdgeCacheEntry, InferredEdgeCacheKey, StorageError, SummaryCacheEntry, SummaryCacheKey, WriterCmd, call_edges_from, call_edges_targeting, candidate_entities_for_unresolved_sites, entity_by_id, existing_entity_ids, - inferred_edge_cache_lookup, summary_cache_lookup, unresolved_call_sites_for_caller, + guidance_sheet_is_expired, guidance_sheet_matches_entity, inferred_edge_cache_lookup, + list_guidance_sheets, summary_cache_lookup, unresolved_call_sites_for_caller, unresolved_callers_for_target, }; @@ -34,6 +36,55 @@ use crate::{ verified_source_excerpt, }; +fn composed_summary_guidance( + conn: &rusqlite::Connection, + entity: &EntityRow, + project_root: &Path, + now: &str, +) -> Result { + let explicit_sheet_ids: HashSet = { + let mut stmt = + conn.prepare("SELECT from_id FROM edges WHERE kind = 'guides' AND to_id = ?1")?; + let rows = stmt.query_map(rusqlite::params![entity.id], |row| row.get::<_, String>(0))?; + rows.collect::>()? + }; + + let canonical_root = project_root + .canonicalize() + .unwrap_or_else(|_| project_root.to_path_buf()); + let mut blocks = Vec::new(); + for sheet in list_guidance_sheets(conn)? { + if guidance_sheet_is_expired(&sheet, now) { + continue; + } + let matched = explicit_sheet_ids.contains(&sheet.id) + || guidance_sheet_matches_entity(conn, &sheet, &entity.id, &canonical_root)?; + if !matched { + continue; + } + let Some(content) = sheet.properties.get("content").and_then(Value::as_str) else { + continue; + }; + let content = content.trim(); + if content.is_empty() { + continue; + } + blocks.push(format!("Guidance sheet {}:\n{}", sheet.id, content)); + } + Ok(blocks.join("\n\n")) +} + +fn guidance_fingerprint(guidance_text: &str) -> String { + if guidance_text.trim().is_empty() { + EMPTY_GUIDANCE_FINGERPRINT.to_owned() + } else { + format!( + "guidance:{}", + blake3::hash(guidance_text.as_bytes()).to_hex() + ) + } +} + impl ServerState { pub(crate) async fn tool_summary( &self, @@ -42,7 +93,7 @@ impl ServerState { let entity_id = required_str(arguments, "id")?.to_owned(); let now = (self.clock)(); let read = match self - .read_summary_inputs(entity_id, self.summary_model_id()) + .read_summary_inputs(entity_id, self.summary_model_id(), now.clone()) .await { Ok(read) => read, @@ -426,7 +477,9 @@ impl ServerState { &self, entity_id: String, summary_model_id: String, + now: String, ) -> Result { + let project_root = self.project_root.clone(); self.readers .with_reader(move |conn| { let Some(entity) = entity_by_id(conn, &entity_id)? else { @@ -444,12 +497,14 @@ impl ServerState { let Some(content_hash) = entity.content_hash.clone() else { return Ok(SummaryRead::MissingContentHash(entity.id)); }; + let guidance_text = composed_summary_guidance(conn, &entity, &project_root, &now)?; + let guidance_fingerprint = guidance_fingerprint(&guidance_text); let key = SummaryCacheKey { entity_id: entity.id.clone(), content_hash, prompt_template_id: LEAF_SUMMARY_PROMPT_TEMPLATE_ID.to_owned(), model_tier: summary_model_id, - guidance_fingerprint: EMPTY_GUIDANCE_FINGERPRINT.to_owned(), + guidance_fingerprint, }; let cached = summary_cache_lookup(conn, &key)?; let caller_count = i64::try_from( @@ -466,6 +521,7 @@ impl ServerState { entity_json: entity_payload, key, cached, + guidance_text, caller_count, fan_out, }))) @@ -522,6 +578,7 @@ impl ServerState { entity_id: ready.entity.id.clone(), kind: ready.entity.kind.clone(), name: ready.entity.name.clone(), + guidance: ready.guidance_text.clone(), source_excerpt: source_excerpt.clone(), }); let request = LlmRequest { diff --git a/crates/clarion-mcp/tests/analyze_lifecycle.rs b/crates/clarion-mcp/tests/analyze_lifecycle.rs index c740d7b6..bc5331e7 100644 --- a/crates/clarion-mcp/tests/analyze_lifecycle.rs +++ b/crates/clarion-mcp/tests/analyze_lifecycle.rs @@ -217,6 +217,20 @@ fn seed_run(db_path: &Path, id: &str, run_status: &str, stats_json: &str) { .expect("insert runs row"); } +fn seed_stale_running_run(db_path: &Path, id: &str) { + let conn = Connection::open(db_path).expect("open db"); + conn.execute( + "INSERT INTO runs ( \ + id, started_at, completed_at, config, stats, status, owner_pid, heartbeat_at \ + ) VALUES ( \ + ?1, '2026-01-01T00:00:00.000Z', NULL, '{}', '{}', \ + 'running', 999999, '2000-01-01T00:00:00.000Z' \ + )", + rusqlite::params![id], + ) + .expect("insert stale running run"); +} + #[tokio::test] async fn analyze_status_maps_terminal_run_states_from_the_runs_table() { let (project, db_path) = open_project(); @@ -247,6 +261,35 @@ async fn analyze_status_maps_terminal_run_states_from_the_runs_table() { } } +#[tokio::test] +async fn analyze_status_marks_stale_running_run_failed_in_db() { + let (project, db_path) = open_project(); + let stub = write_stub(project.path()); + let state = state_for(project.path(), &db_path, &stub); + + seed_stale_running_run(&db_path, "r-stale"); + + let resp = call_tool(&state, "analyze_status", json!({"run_id": "r-stale"})).await; + assert_eq!(resp["ok"], true, "{resp:?}"); + assert_eq!(resp["result"]["status"], "failed"); + + let conn = Connection::open(&db_path).expect("open db"); + let (run_status, run_owner_pid, stats_json): (String, Option, String) = conn + .query_row( + "SELECT status, owner_pid, stats FROM runs WHERE id = 'r-stale'", + [], + |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)), + ) + .expect("read repaired run"); + assert_eq!(run_status, "failed"); + assert_eq!(run_owner_pid, None); + let repair_stats: Value = serde_json::from_str(&stats_json).expect("stats json"); + assert_eq!( + repair_stats["failure_reason"], + "analyze run abandoned: stale heartbeat" + ); +} + #[tokio::test] async fn analyze_start_reaps_finished_runs_and_their_progress_files() { let (project, db_path) = open_project(); diff --git a/crates/clarion-mcp/tests/catalogue_tools.rs b/crates/clarion-mcp/tests/catalogue_tools.rs index c3fc247c..f65ece77 100644 --- a/crates/clarion-mcp/tests/catalogue_tools.rs +++ b/crates/clarion-mcp/tests/catalogue_tools.rs @@ -132,7 +132,11 @@ async fn call_tool(state: &ServerState, name: &str, arguments: Value) -> Value { #[test] fn tools_list_includes_ws5_inspection_tools() { let names: Vec<&str> = list_tools().iter().map(|t| t.name).collect(); - for expected in ["guidance_for", "findings_for", "wardline_for"] { + for expected in [ + "entity_guidance_list", + "entity_finding_list", + "entity_wardline_get", + ] { assert!(names.contains(&expected), "missing tool {expected}"); } } @@ -307,6 +311,50 @@ async fn findings_for_paginates_with_total_and_truncated() { assert_eq!(env["result"]["findings"].as_array().unwrap().len(), 2); } +#[tokio::test] +async fn findings_for_applies_filter_before_large_result_cap() { + let (project, db, conn) = open_project(); + insert_entity( + &conn, + "python:function:m.f", + "function", + "m.py", + Some((1, 2)), + ); + for i in 0..5000 { + insert_finding( + &conn, + &format!("f-{i:04}"), + "python:function:m.f", + "defect", + "WARN", + "open", + ); + } + insert_finding( + &conn, + "z-critical", + "python:function:m.f", + "defect", + "CRITICAL", + "open", + ); + drop(conn); + let state = state_for(project.path(), &db); + + let env = call_tool( + &state, + "findings_for", + json!({"id": "python:function:m.f", "filter": {"severity": "CRITICAL"}}), + ) + .await; + + assert_eq!(env["ok"], true, "{env}"); + assert_eq!(env["result"]["page"]["total"], 1, "{env}"); + assert_eq!(env["result"]["findings"][0]["id"], "z-critical"); + assert_eq!(env["result"]["scan_truncated"], false, "{env}"); +} + #[tokio::test] async fn findings_for_empty_entity_is_not_an_error() { let (project, db, conn) = open_project(); @@ -403,6 +451,55 @@ async fn guidance_for_excludes_expired_sheets() { ); } +#[tokio::test] +async fn guidance_for_honors_unix_clock_for_expiry() { + // Regression for clarion-3153e74f0b: production `serve` uses the default + // `unix:` clock (never `.with_clock(...)`). A raw lexical compare + // against an ISO `expires` (which starts with '2' < 'u') wrongly classified + // EVERY sheet with any `expires` as expired. This exercises the production + // clock path: a far-future sheet must survive; a far-past one must be dropped. + let (project, db, conn) = open_project(); + insert_entity( + &conn, + "python:function:m.f", + "function", + "m.py", + Some((1, 2)), + ); + insert_guidance( + &conn, + "core:guidance:future", + r#"{"scope_level":"project","scope_rank":1,"content":"F","authored_at":"2026-01-01", + "expires":"2999-12-31T00:00:00.000Z","match_rules":[{"type":"kind","value":"function"}]}"#, + ); + insert_guidance( + &conn, + "core:guidance:past", + r#"{"scope_level":"project","scope_rank":1,"content":"P","authored_at":"2026-01-01", + "expires":"2000-01-01T00:00:00.000Z","match_rules":[{"type":"kind","value":"function"}]}"#, + ); + drop(conn); + // Production-style clock: `unix:` (here a fixed mid-2025 instant), + // matching `default_now_string`'s form — between the past (2000) and + // future (2999) expiries. + let pool = ReaderPool::open(&db, 2).expect("reader pool"); + let state = ServerState::new(project.path().to_path_buf(), pool) + .with_clock(|| "unix:1748822400".to_owned()); + + let env = call_tool(&state, "guidance_for", json!({"id": "python:function:m.f"})).await; + assert_eq!(env["ok"], true, "{env}"); + let sheets = env["result"]["guidance"].as_array().unwrap(); + let ids: Vec<&str> = sheets.iter().map(|s| s["id"].as_str().unwrap()).collect(); + assert!( + ids.contains(&"core:guidance:future"), + "far-future sheet must survive under the unix: clock, got {ids:?} in {env}" + ); + assert!( + !ids.contains(&"core:guidance:past"), + "far-past sheet must be excluded, got {ids:?} in {env}" + ); +} + #[tokio::test] async fn guidance_for_honest_empty_when_no_sheet_matches() { let (project, db, conn) = open_project(); @@ -645,6 +742,22 @@ fn insert_edge(conn: &Connection, kind: &str, from: &str, to: &str, confidence: .expect("insert edge"); } +fn insert_edge_with_properties( + conn: &Connection, + kind: &str, + from: &str, + to: &str, + confidence: &str, + properties: &Value, +) { + conn.execute( + "INSERT INTO edges (kind, from_id, to_id, confidence, properties) \ + VALUES (?1, ?2, ?3, ?4, ?5)", + params![kind, from, to, confidence, properties.to_string()], + ) + .expect("insert edge with properties"); +} + #[tokio::test] async fn find_circular_imports_detects_a_cycle() { let (project, db, conn) = open_project(); @@ -743,6 +856,51 @@ async fn find_circular_imports_default_confidence_excludes_inferred() { assert_eq!(env["result"]["confidence"], "inferred"); } +#[tokio::test] +async fn find_circular_imports_ignores_type_only_and_function_local_imports() { + let (project, db, conn) = open_project(); + for id in ["python:module:a", "python:module:b", "python:module:c"] { + insert_entity(&conn, id, "module", "a.py", Some((1, 5))); + } + insert_edge( + &conn, + "imports", + "python:module:a", + "python:module:b", + "resolved", + ); + insert_edge_with_properties( + &conn, + "imports", + "python:module:b", + "python:module:a", + "resolved", + &json!({"type_only": true}), + ); + insert_edge( + &conn, + "imports", + "python:module:b", + "python:module:c", + "resolved", + ); + insert_edge_with_properties( + &conn, + "imports", + "python:module:c", + "python:module:b", + "resolved", + &json!({"scope": "function"}), + ); + drop(conn); + let state = state_for(project.path(), &db); + + let env = call_tool(&state, "find_circular_imports", json!({})).await; + + assert_eq!(env["ok"], true, "{env}"); + assert_eq!(env["result"]["page"]["total"], 0, "{env}"); +} + #[tokio::test] async fn find_coupling_hotspots_ranks_by_fan_in_plus_out() { let (project, db, conn) = open_project(); @@ -1044,10 +1202,23 @@ fn insert_calls_edge(conn: &Connection, from: &str, to: &str, confidence: &str) .expect("insert calls edge"); } +fn insert_ambiguous_calls_edge(conn: &Connection, from: &str, to: &str, candidates: &[&str]) { + let properties = json!({ "candidates": candidates }).to_string(); + conn.execute( + "INSERT INTO edges (kind, from_id, to_id, confidence, properties) \ + VALUES ('calls', ?1, ?2, 'ambiguous', ?3)", + params![from, to, properties], + ) + .expect("insert ambiguous calls edge"); +} + #[test] fn tools_list_includes_find_dead_code() { let names: Vec<&str> = list_tools().iter().map(|t| t.name).collect(); - assert!(names.contains(&"find_dead_code"), "missing find_dead_code"); + assert!( + names.contains(&"entity_dead_list"), + "missing entity_dead_list" + ); } // Safety case (and the catastrophe guard): with no reachability roots emitted, @@ -1113,11 +1284,18 @@ async fn find_dead_code_flags_unreachable_and_spares_live() { "app.py", Some((11, 15)), ); - insert_calls_edge( + insert_entity( + &conn, + "python:function:maybe_other", + "function", + "app.py", + Some((16, 17)), + ); + insert_ambiguous_calls_edge( &conn, "python:function:helper", "python:function:maybe", - "ambiguous", + &["python:function:maybe", "python:function:maybe_other"], ); // Reflectively reached: no static edge, but barrier-tagged → live. insert_entity( @@ -1125,7 +1303,7 @@ async fn find_dead_code_flags_unreachable_and_spares_live() { "python:function:reflected", "function", "app.py", - Some((16, 20)), + Some((18, 20)), ); insert_tag(&conn, "python:function:reflected", "dynamic-dispatch"); // Genuinely dead leaf. @@ -1203,8 +1381,8 @@ async fn find_dead_code_excludes_framework_magic() { fn tools_list_includes_search_semantic() { let names: Vec<&str> = list_tools().iter().map(|t| t.name).collect(); assert!( - names.contains(&"search_semantic"), - "missing search_semantic" + names.contains(&"entity_semantic_search_list"), + "missing entity_semantic_search_list" ); } diff --git a/crates/clarion-mcp/tests/storage_tools.rs b/crates/clarion-mcp/tests/storage_tools.rs index d2c2c630..d0eed717 100644 --- a/crates/clarion-mcp/tests/storage_tools.rs +++ b/crates/clarion-mcp/tests/storage_tools.rs @@ -16,13 +16,15 @@ use clarion_mcp::{ config::{FiligreeConfig, LlmConfig, LlmProviderKind}, filigree::{ EntityAssociation, EntityAssociationsResponse, FiligreeClientError, FiligreeLookup, - IssueDetail, WardlineFinding, + IssueDetail, ObservationCreateRequest, ObservationCreateResponse, ObservationRecord, + WardlineFinding, }, filigree_url::{SOURCE_CONFIG, SOURCE_EPHEMERAL_PORT, resolve_filigree_url}, list_tools, }; use clarion_storage::{ - ReaderPool, SummaryCacheEntry, SummaryCacheKey, Writer, pragma, schema, upsert_summary_cache, + GuidanceSheetInput, ReaderPool, SummaryCacheEntry, SummaryCacheKey, Writer, pragma, schema, + upsert_guidance_sheet, upsert_summary_cache, }; use rusqlite::{Connection, params}; use serde_json::{Value, json}; @@ -56,6 +58,7 @@ fn seed_graph(conn: &Connection, project_root: &std::path::Path) { ) .expect("write demo source"); + insert_file_entity(conn, "core:file:demo.py", &source_path); insert_entity( conn, "python:module:demo", @@ -162,6 +165,29 @@ fn seed_graph(conn: &Connection, project_root: &std::path::Path) { ); } +fn insert_file_entity(conn: &Connection, id: &str, source_path: &std::path::Path) { + let content_hash = blake3::hash(&std::fs::read(source_path).expect("read file source")) + .to_hex() + .to_string(); + conn.execute( + "INSERT INTO entities ( + id, plugin_id, kind, name, short_name, source_file_path, properties, content_hash, + created_at, updated_at + ) VALUES ( + ?1, 'core', 'file', ?2, ?2, ?3, '{}', ?4, + strftime('%Y-%m-%dT%H:%M:%fZ', 'now'), + strftime('%Y-%m-%dT%H:%M:%fZ', 'now') + )", + params![ + id, + source_path.file_name().unwrap().to_string_lossy().as_ref(), + source_path.display().to_string(), + content_hash, + ], + ) + .expect("insert file entity"); +} + fn insert_entity( conn: &Connection, id: &str, @@ -288,7 +314,7 @@ fn insert_unresolved_call_site(conn: &Connection, caller_id: &str, site_key: &st "INSERT INTO entity_unresolved_call_sites ( caller_entity_id, caller_content_hash, site_key, site_ordinal, source_file_id, source_byte_start, source_byte_end, callee_expr, created_at - ) VALUES (?1, ?2, ?3, 0, 'python:module:demo', 30, 37, ?4, '2026-05-17T00:00:00.000Z')", + ) VALUES (?1, ?2, ?3, 0, 'core:file:demo.py', 30, 37, ?4, '2026-05-17T00:00:00.000Z')", params![caller_id, caller_content_hash, site_key, expr], ) .expect("insert unresolved call site"); @@ -381,6 +407,7 @@ fn expected_summary_request(project_root: &std::path::Path, entity_id: &str) -> entity_id: entity_id.to_owned(), kind: "function".to_owned(), name: entity_id.to_owned(), + guidance: String::new(), source_excerpt, }); LlmRequest { @@ -422,7 +449,7 @@ fn expected_inferred_request( "caller_content_hash": caller_content_hash, "site_key": site_key, "site_ordinal": 0, - "source_file_id": "python:module:demo", + "source_file_id": "core:file:demo.py", "source_byte_start": 30, "source_byte_end": 37, "callee_expr": callee_expr @@ -596,12 +623,13 @@ impl AnySummaryProvider { } } +#[async_trait::async_trait] impl LlmProvider for AnySummaryProvider { fn name(&self) -> &'static str { "recording" } - fn invoke(&self, request: LlmRequest) -> Result { + async fn invoke(&self, request: LlmRequest) -> Result { self.invocations .lock() .unwrap_or_else(std::sync::PoisonError::into_inner) @@ -633,12 +661,13 @@ impl LlmProvider for AnySummaryProvider { } } +#[async_trait::async_trait] impl LlmProvider for AnyInferredProvider { fn name(&self) -> &'static str { "recording" } - fn invoke(&self, request: LlmRequest) -> Result { + async fn invoke(&self, request: LlmRequest) -> Result { self.invocations .lock() .unwrap_or_else(std::sync::PoisonError::into_inner) @@ -682,6 +711,9 @@ struct FakeFiligreeClient { wardline_findings: Mutex>, /// When true, `wardline_findings_for_path` returns an `HttpStatus` 503 error. wardline_error: Mutex, + created_observations: Mutex>, + observations: Mutex>, + dismissed_observations: Mutex>, } impl FakeFiligreeClient { @@ -728,6 +760,20 @@ impl FakeFiligreeClient { .unwrap_or_else(std::sync::PoisonError::into_inner) .clone() } + + fn created_observations(&self) -> Vec { + self.created_observations + .lock() + .unwrap_or_else(std::sync::PoisonError::into_inner) + .clone() + } + + fn dismissed_observations(&self) -> Vec { + self.dismissed_observations + .lock() + .unwrap_or_else(std::sync::PoisonError::into_inner) + .clone() + } } impl FiligreeLookup for FakeFiligreeClient { @@ -783,6 +829,58 @@ impl FiligreeLookup for FakeFiligreeClient { .unwrap_or_else(std::sync::PoisonError::into_inner) .clone()) } + + fn create_observation( + &self, + request: ObservationCreateRequest, + ) -> Result { + let mut created = self + .created_observations + .lock() + .unwrap_or_else(std::sync::PoisonError::into_inner); + created.push(request.clone()); + let observation_id = format!("clarion-obs-{}", created.len()); + self.observations + .lock() + .unwrap_or_else(std::sync::PoisonError::into_inner) + .insert( + observation_id.clone(), + ObservationRecord { + observation_id: observation_id.clone(), + summary: request.summary.clone(), + detail: request.detail.clone(), + file_path: request.file_path.clone().unwrap_or_default(), + line: request.line, + priority: request.priority, + actor: request.actor.clone(), + }, + ); + Ok(ObservationCreateResponse { observation_id }) + } + + fn observation_by_id( + &self, + observation_id: &str, + ) -> Result, FiligreeClientError> { + Ok(self + .observations + .lock() + .unwrap_or_else(std::sync::PoisonError::into_inner) + .get(observation_id) + .cloned()) + } + + fn dismiss_observation( + &self, + observation_id: &str, + _reason: &str, + ) -> Result<(), FiligreeClientError> { + self.dismissed_observations + .lock() + .unwrap_or_else(std::sync::PoisonError::into_inner) + .push(observation_id.to_owned()); + Ok(()) + } } fn association(issue_id: &str, entity_id: &str, content_hash: &str) -> EntityAssociation { @@ -818,8 +916,8 @@ fn tools_list_includes_subsystem_members() { let tools = list_tools(); let tool = tools .iter() - .find(|tool| tool.name == "subsystem_members") - .expect("subsystem_members tool definition"); + .find(|tool| tool.name == "subsystem_member_list") + .expect("subsystem_member_list tool definition"); assert_eq!( tool.description, @@ -848,7 +946,7 @@ async fn subsystem_members_returns_member_modules() { let envelope = call_tool(&state, "subsystem_members", json!({"id": subsystem_id})).await; - assert_eq!(envelope["ok"], true); + assert_eq!(envelope["ok"], true, "{envelope}"); assert_eq!( envelope["result"]["subsystem"]["id"], "core:subsystem:abc123def456" @@ -1389,6 +1487,357 @@ async fn summary_cold_miss_records_provider_response_then_hits_cache() { handle.await.unwrap().unwrap(); } +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn summary_cache_key_and_prompt_include_matching_guidance() { + let (project, db_path) = open_project(); + let conn = Connection::open(&db_path).unwrap(); + upsert_summary_cache( + &conn, + &SummaryCacheEntry { + key: SummaryCacheKey { + entity_id: "python:function:demo.entry".to_owned(), + content_hash: expected_content_hash(project.path(), "python:function:demo.entry"), + prompt_template_id: LEAF_SUMMARY_PROMPT_TEMPLATE_ID.to_owned(), + model_tier: "anthropic/claude-sonnet-4.6".to_owned(), + guidance_fingerprint: "guidance-empty".to_owned(), + }, + summary_json: r#"{"purpose":"unguided"}"#.to_owned(), + cost_usd: 0.001, + tokens_input: 100, + tokens_output: 20, + caller_count: 0, + fan_out: 2, + stale_semantic: false, + created_at: "2026-05-17T00:00:00.000Z".to_owned(), + last_accessed_at: "2026-05-17T00:00:00.000Z".to_owned(), + }, + ) + .unwrap(); + let guidance_properties = json!({ + "content": "Prefer operational risk notes when summarising functions.", + "scope_level": "function", + "match_rules": [{"type": "entity", "id": "python:function:demo.entry"}], + "provenance": {"author": "test"}, + "authored_at": "2026-05-17T00:00:00.000Z" + }); + upsert_guidance_sheet( + &conn, + &GuidanceSheetInput { + id: "core:guidance:test-summary", + name: "test-summary", + short_name: "test-summary", + properties: &guidance_properties, + }, + ) + .unwrap(); + drop(conn); + + let (writer, handle) = Writer::spawn(db_path.clone(), 50, 256).unwrap(); + let provider = Arc::new(AnySummaryProvider::new_output( + r#"{"purpose":"guided"}"#, + 120, + 0.0, + )); + let state = state_for_summary( + project.path(), + &db_path, + &writer, + provider.clone(), + llm_config(), + ); + + let cold = call_tool( + &state, + "summary", + json!({"id": "python:function:demo.entry"}), + ) + .await; + + assert_eq!(cold["ok"], true, "{cold}"); + assert_eq!(cold["result"]["cache"]["hit"], false); + assert_eq!(cold["result"]["summary"]["purpose"], "guided"); + let invocation = provider + .invocations() + .into_iter() + .next() + .expect("summary provider invocation"); + assert!( + invocation + .prompt + .contains("Prefer operational risk notes when summarising functions."), + "summary prompt should include matching guidance: {}", + invocation.prompt + ); + + let conn = Connection::open(&db_path).unwrap(); + let fingerprints: Vec = { + let mut stmt = conn + .prepare( + "SELECT guidance_fingerprint FROM summary_cache \ + WHERE entity_id = 'python:function:demo.entry' \ + ORDER BY guidance_fingerprint", + ) + .unwrap(); + stmt.query_map([], |row| row.get::<_, String>(0)) + .unwrap() + .collect::>() + .unwrap() + }; + assert!(fingerprints.iter().any(|fp| fp == "guidance-empty")); + assert!( + fingerprints + .iter() + .any(|fp| fp.starts_with("guidance:") && fp != "guidance-empty"), + "guided summary should use a non-empty guidance fingerprint: {fingerprints:?}" + ); + drop(conn); + + let warm = call_tool( + &state, + "summary", + json!({"id": "python:function:demo.entry"}), + ) + .await; + assert_eq!(warm["ok"], true, "{warm}"); + assert_eq!(warm["result"]["cache"]["hit"], true); + assert_eq!(warm["result"]["summary"]["purpose"], "guided"); + assert_eq!(provider.invocations().len(), 1); + + drop(state); + drop(writer); + handle.await.unwrap().unwrap(); +} + +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn summary_keeps_future_guidance_under_unix_clock() { + let (project, db_path) = open_project(); + let conn = Connection::open(&db_path).unwrap(); + let guidance_properties = json!({ + "content": "Future-dated guidance must still reach summary prompts.", + "scope_level": "function", + "expires": "2999-12-31T00:00:00.000Z", + "match_rules": [{"type": "entity", "id": "python:function:demo.entry"}], + "provenance": {"author": "test"}, + "authored_at": "2026-05-17T00:00:00.000Z" + }); + upsert_guidance_sheet( + &conn, + &GuidanceSheetInput { + id: "core:guidance:test-summary-future", + name: "test-summary-future", + short_name: "test-summary-future", + properties: &guidance_properties, + }, + ) + .unwrap(); + drop(conn); + + let (writer, handle) = Writer::spawn(db_path.clone(), 50, 256).unwrap(); + let provider = Arc::new(AnySummaryProvider::new_output( + r#"{"purpose":"guided"}"#, + 120, + 0.0, + )); + let state = state_for_summary( + project.path(), + &db_path, + &writer, + provider.clone(), + llm_config(), + ) + .with_clock(|| "unix:1748822400".to_owned()); + + let cold = call_tool( + &state, + "summary", + json!({"id": "python:function:demo.entry"}), + ) + .await; + + assert_eq!(cold["ok"], true, "{cold}"); + let invocation = provider + .invocations() + .into_iter() + .next() + .expect("summary provider invocation"); + assert!( + invocation + .prompt + .contains("Future-dated guidance must still reach summary prompts."), + "summary prompt should include future guidance under unix clock: {}", + invocation.prompt + ); + + drop(state); + drop(writer); + handle.await.unwrap().unwrap(); +} + +#[tokio::test] +async fn summary_preview_cost_counts_future_guidance_under_unix_clock() { + let (project, db_path) = open_project(); + let conn = Connection::open(&db_path).unwrap(); + let guidance_properties = json!({ + "content": "Future-dated guidance must still reach preview estimates.", + "scope_level": "function", + "expires": "2999-12-31T00:00:00.000Z", + "match_rules": [{"type": "entity", "id": "python:function:demo.entry"}], + "provenance": {"author": "test"}, + "authored_at": "2026-05-17T00:00:00.000Z" + }); + upsert_guidance_sheet( + &conn, + &GuidanceSheetInput { + id: "core:guidance:test-summary-preview-future", + name: "test-summary-preview-future", + short_name: "test-summary-preview-future", + properties: &guidance_properties, + }, + ) + .unwrap(); + drop(conn); + + let (writer, handle) = Writer::spawn(db_path.clone(), 50, 256).unwrap(); + let provider = Arc::new(AnySummaryProvider::new_output( + r#"{"purpose":"unused"}"#, + 120, + 0.0, + )); + let state = state_for_summary( + project.path(), + &db_path, + &writer, + provider.clone(), + llm_config(), + ) + .with_clock(|| "unix:1748822400".to_owned()); + + let envelope = call_tool( + &state, + "summary_preview_cost", + json!({"id": "python:function:demo.entry"}), + ) + .await; + + let prompt = build_leaf_summary_prompt(&LeafSummaryPromptInput { + entity_id: "python:function:demo.entry".to_owned(), + kind: "function".to_owned(), + name: "python:function:demo.entry".to_owned(), + guidance: "Guidance sheet core:guidance:test-summary-preview-future:\n\ + Future-dated guidance must still reach preview estimates." + .to_owned(), + source_excerpt: expected_source_excerpt(project.path(), "python:function:demo.entry"), + }); + let expected_tokens = i64::try_from(prompt.body.chars().count()) + .unwrap_or(i64::MAX) + .saturating_add(3) + / 4; + + assert_eq!(envelope["ok"], true, "{envelope}"); + assert_eq!(envelope["result"]["cache_status"], "miss"); + assert_eq!( + envelope["result"]["estimated_input_tokens"], expected_tokens, + "preview estimate should include future guidance under unix clock: {envelope}" + ); + assert!( + provider.invocations().is_empty(), + "preview must not call the LLM provider" + ); + + drop(state); + drop(writer); + handle.await.unwrap().unwrap(); +} + +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn propose_guidance_creates_observation_and_promote_makes_sheet_visible() { + let (project, db_path) = open_project(); + let client = Arc::new(FakeFiligreeClient::default()); + let state = state_for_filigree(project.path(), &db_path, client.clone()) + .with_clock(|| "unix:1748822400".to_owned()); + + let proposed = call_tool( + &state, + "propose_guidance", + json!({ + "entity_id": "python:function:demo.entry", + "content": "Prefer operational risk notes when summarising entrypoints.", + "scope_level": "function", + "name": "demo-entry-risk", + "pinned": true + }), + ) + .await; + + assert_eq!(proposed["ok"], true); + assert_eq!(proposed["result"]["observation_id"], "clarion-obs-1"); + let created = client.created_observations(); + assert_eq!(created.len(), 1); + assert!(created[0].summary.contains("python:function:demo.entry")); + + let inert = call_tool( + &state, + "guidance_for", + json!({"id": "python:function:demo.entry"}), + ) + .await; + assert_eq!(inert["ok"], true); + assert_eq!( + inert["result"]["guidance"] + .as_array() + .expect("guidance array") + .len(), + 0, + "a proposal must not be composed before promotion" + ); + + let promoted = call_tool( + &state, + "promote_guidance", + json!({"observation_id": "clarion-obs-1"}), + ) + .await; + assert_eq!(promoted["ok"], true); + assert_eq!( + promoted["result"]["sheet_id"], + "core:guidance:demo-entry-risk" + ); + assert_eq!( + client.dismissed_observations(), + vec!["clarion-obs-1".to_owned()] + ); + + let visible = call_tool( + &state, + "guidance_for", + json!({"id": "python:function:demo.entry"}), + ) + .await; + assert_eq!(visible["ok"], true); + let sheets = visible["result"]["guidance"] + .as_array() + .expect("guidance array"); + assert_eq!(sheets.len(), 1); + assert_eq!(sheets[0]["id"], "core:guidance:demo-entry-risk"); + assert_eq!( + sheets[0]["content"], + "Prefer operational risk notes when summarising entrypoints." + ); + assert_eq!(sheets[0]["provenance"], "filigree_promotion"); + assert_eq!(sheets[0]["matched_by"], json!(["entity"])); + + let conn = Connection::open(&db_path).unwrap(); + let authored_at: String = conn + .query_row( + "SELECT json_extract(properties, '$.authored_at') \ + FROM entities WHERE id = 'core:guidance:demo-entry-risk'", + [], + |row| row.get(0), + ) + .unwrap(); + assert_eq!(authored_at, "2025-06-02T00:00:00.000Z"); +} + #[tokio::test(flavor = "multi_thread", worker_threads = 2)] async fn summary_invalid_json_falls_back_to_structural_summary() { let (project, db_path) = open_project(); @@ -2253,7 +2702,7 @@ async fn orientation_pack_for_entity_bundles_all_sections_deterministically() { assert!(result["health"]["index"].is_object()); assert!(result["omitted"].is_object()); let suggested = result["suggested_next_reads"].as_array().unwrap(); - assert_eq!(suggested[0]["tool"], "source_for_entity"); + assert_eq!(suggested[0]["tool"], "entity_source_get"); // Filigree is disabled in this fixture → a clear degradation warning, not a // silent empty section. @@ -2945,7 +3394,7 @@ async fn callers_of_inferred_dispatches_and_materializes_recording_result() { ) .await; - assert_eq!(envelope["ok"], true); + assert_eq!(envelope["ok"], true, "{envelope}"); assert_eq!( envelope["result"]["callers"][0]["entity"]["id"], "python:function:demo.entry" @@ -2979,6 +3428,60 @@ async fn callers_of_inferred_dispatches_and_materializes_recording_result() { handle.await.unwrap().unwrap(); } +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn callers_of_inferred_ignores_stale_unresolved_call_sites() { + let (project, db_path) = open_project(); + let conn = Connection::open(&db_path).unwrap(); + add_dynamic_source(project.path()); + let source_path = project.path().join("demo.py"); + insert_entity( + &conn, + "python:function:demo.dynamic", + "function", + &source_path, + Some((9, 10)), + Some("python:module:demo"), + ); + insert_unresolved_call_site(&conn, "python:function:demo.entry", "site-stale", "dynamic"); + conn.execute( + "UPDATE entities SET content_hash = 'hash-after-body-change' \ + WHERE id = 'python:function:demo.entry'", + [], + ) + .expect("simulate a changed caller body without authoritative unresolved rows"); + drop(conn); + + let (writer, handle) = Writer::spawn(db_path.clone(), 50, 256).unwrap(); + let provider = Arc::new(AnyInferredProvider::new( + r#"{"edges":[{"site_key":"site-stale","target_id":"python:function:demo.dynamic","confidence":0.91,"rationale":"stale"}]}"#, + )); + let state = state_for_summary( + project.path(), + &db_path, + &writer, + provider.clone(), + llm_config(), + ); + + let envelope = call_tool( + &state, + "callers_of", + json!({"id": "python:function:demo.dynamic", "confidence": "inferred"}), + ) + .await; + + assert_eq!(envelope["ok"], true, "{envelope}"); + assert_eq!(envelope["result"]["callers"].as_array().unwrap().len(), 0); + assert!( + provider.invocations().is_empty(), + "stale unresolved rows must not trigger inferred dispatch" + ); + + drop(state); + drop(writer); + handle.await.unwrap().unwrap(); +} + #[tokio::test(flavor = "multi_thread", worker_threads = 2)] async fn attribute_receiver_call_is_excluded_at_resolved_but_attempted_at_inferred() { // Attribute-receiver call `ctx.dynamic()` (callee_expr `ctx.dynamic`): the @@ -3964,8 +4467,8 @@ fn tools_list_includes_project_status() { let tools = list_tools(); let tool = tools .iter() - .find(|tool| tool.name == "project_status") - .expect("project_status tool definition"); + .find(|tool| tool.name == "project_status_get") + .expect("project_status_get tool definition"); assert_eq!( tool.input_schema, json!({"type": "object", "properties": {}, "additionalProperties": false}) @@ -4085,6 +4588,60 @@ async fn project_status_marks_skipped_no_plugins_run() { ); } +#[tokio::test] +async fn project_status_marks_stale_running_run_failed() { + let (project, db_path) = open_project(); + let conn = Connection::open(&db_path).expect("open sqlite"); + conn.execute( + "INSERT INTO runs ( \ + id, started_at, completed_at, config, stats, status, owner_pid, heartbeat_at \ + ) VALUES ( \ + 'run-stale', '2026-02-04T00:00:00.000Z', NULL, '{}', '{}', \ + 'running', 999999, '2000-01-01T00:00:00.000Z' \ + )", + [], + ) + .expect("insert stale running run"); + drop(conn); + + let state = state_for(project.path(), &db_path); + let envelope = call_tool(&state, "project_status", json!({})).await; + assert_eq!(envelope["ok"], true, "{envelope}"); + let latest = &envelope["result"]["latest_run"]; + assert_eq!(latest["id"], "run-stale"); + assert_eq!(latest["status"], "failed"); + assert_eq!(latest["owner_pid"], Value::Null); + assert_eq!(latest["heartbeat_at"], "2000-01-01T00:00:00.000Z"); + assert!( + latest["completed_at"] + .as_str() + .is_some_and(|value| value.ends_with('Z')), + "completed_at should be recorded on stale-run repair: {latest}" + ); + + let conn = Connection::open(&db_path).expect("reopen sqlite"); + let (run_status, completed_at, run_owner_pid, stats_json): ( + String, + Option, + Option, + String, + ) = conn + .query_row( + "SELECT status, completed_at, owner_pid, stats FROM runs WHERE id = 'run-stale'", + [], + |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?)), + ) + .expect("read repaired run"); + assert_eq!(run_status, "failed"); + assert!(completed_at.is_some()); + assert_eq!(run_owner_pid, None); + let repair_stats: Value = serde_json::from_str(&stats_json).expect("stats json"); + assert_eq!( + repair_stats["failure_reason"], + "analyze run abandoned: stale heartbeat" + ); +} + #[tokio::test] async fn project_status_skipped_run_keeps_prior_completed_index_visible() { // The real dogfood shape: a skipped_no_plugins run AFTER a completed one. diff --git a/crates/clarion-plugin-fixture/Cargo.toml b/crates/clarion-plugin-fixture/Cargo.toml index f3ade5b1..08745947 100644 --- a/crates/clarion-plugin-fixture/Cargo.toml +++ b/crates/clarion-plugin-fixture/Cargo.toml @@ -14,7 +14,7 @@ name = "clarion-plugin-fixture" path = "src/main.rs" [dependencies] -clarion-core = { path = "../clarion-core", version = "1.1.0" } +clarion-core = { path = "../clarion-core", version = "1.2.0" } serde_json.workspace = true [target.'cfg(unix)'.dependencies] diff --git a/crates/clarion-storage/Cargo.toml b/crates/clarion-storage/Cargo.toml index 2a01651c..e08a6360 100644 --- a/crates/clarion-storage/Cargo.toml +++ b/crates/clarion-storage/Cargo.toml @@ -11,12 +11,13 @@ workspace = true [dependencies] blake3.workspace = true -clarion-core = { path = "../clarion-core", version = "1.1.0" } +clarion-core = { path = "../clarion-core", version = "1.2.0" } deadpool-sqlite.workspace = true rusqlite.workspace = true serde.workspace = true serde_json.workspace = true thiserror.workspace = true +time.workspace = true tokio.workspace = true tracing.workspace = true diff --git a/crates/clarion-storage/migrations/0008_run_owner_heartbeat.sql b/crates/clarion-storage/migrations/0008_run_owner_heartbeat.sql new file mode 100644 index 00000000..a762f08d --- /dev/null +++ b/crates/clarion-storage/migrations/0008_run_owner_heartbeat.sql @@ -0,0 +1,20 @@ +-- Migration 0008: runs.owner_pid + runs.heartbeat_at (H7 stale-running +-- reconciliation). +-- +-- `owner_pid` is diagnostic ownership for the process that opened/resumed the +-- run. `heartbeat_at` is the cross-platform freshness signal readers can use +-- to identify abandoned `running` rows after an unclean process death. + +BEGIN; + +ALTER TABLE runs ADD COLUMN owner_pid INTEGER; +ALTER TABLE runs ADD COLUMN heartbeat_at TEXT; + +CREATE INDEX ix_runs_running_heartbeat + ON runs(status, heartbeat_at) + WHERE status = 'running'; + +INSERT INTO schema_migrations (version, name, applied_at) +VALUES (8, '0008_run_owner_heartbeat', strftime('%Y-%m-%dT%H:%M:%fZ', 'now')); + +COMMIT; diff --git a/crates/clarion-storage/src/commands.rs b/crates/clarion-storage/src/commands.rs index 5a63f3cd..b2467b15 100644 --- a/crates/clarion-storage/src/commands.rs +++ b/crates/clarion-storage/src/commands.rs @@ -63,6 +63,8 @@ pub struct EntityRecord { pub source_line_end: Option, /// JSON string; writer inserts verbatim. pub properties_json: String, + /// Plugin-emitted categorisation tags to denormalise into `entity_tags`. + pub tags: Vec, pub content_hash: Option, pub summary_json: Option, pub wardline_json: Option, @@ -86,8 +88,8 @@ pub struct EdgeRecord { pub confidence: EdgeConfidence, /// JSON string; writer inserts verbatim. None ⇒ NULL. pub properties_json: Option, - /// Module entity id for the file the edge was emitted from. Derived by - /// the host, not the plugin (ADR-022 boundary). + /// Core file entity id for the file the edge was emitted from. Derived by + /// the host/CLI, not the plugin (ADR-022 boundary). pub source_file_id: Option, pub source_byte_start: Option, pub source_byte_end: Option, diff --git a/crates/clarion-storage/src/glob.rs b/crates/clarion-storage/src/glob.rs new file mode 100644 index 00000000..e5927762 --- /dev/null +++ b/crates/clarion-storage/src/glob.rs @@ -0,0 +1,85 @@ +//! Path glob matching shared across the read (MCP `scope` / guidance +//! `match_rules`) and write (CLI `guidance --for-entity`) surfaces. +//! +//! Lifted into the storage crate so a single implementation backs both the MCP +//! catalogue (which historically owned it as `catalogue::glob_match`) and the +//! CLI guidance authoring path — one matcher, no drift. `clarion-mcp` +//! re-exports this function so its semantics are unchanged. + +/// Glob-match `path` against a `**`/`*`/`?` `pattern`, treating `/` as the +/// path separator. `**` matches zero or more whole segments; `*` matches any +/// run of non-`/` characters within a single segment; `?` matches one such +/// character. Used by `scope` path-globs and by guidance `path` match-rules. +#[must_use] +pub fn glob_match(pattern: &str, path: &str) -> bool { + let pat: Vec<&str> = pattern.split('/').collect(); + let seg: Vec<&str> = path.split('/').collect(); + glob_segments(&pat, &seg) +} + +fn glob_segments(pat: &[&str], seg: &[&str]) -> bool { + match pat.first() { + None => seg.is_empty(), + Some(&"**") => { + // `**` consumes zero or more whole segments; try each split point. + (0..=seg.len()).any(|i| glob_segments(&pat[1..], &seg[i..])) + } + Some(head) => match seg.first() { + Some(name) if segment_match(head.as_bytes(), name.as_bytes()) => { + glob_segments(&pat[1..], &seg[1..]) + } + _ => false, + }, + } +} + +/// Within-segment wildcard match: `*` matches any run, `?` matches one char. +fn segment_match(pat: &[u8], name: &[u8]) -> bool { + match pat.first() { + None => name.is_empty(), + Some(b'*') => { + // `*` matches zero or more chars within the segment. + (0..=name.len()).any(|i| segment_match(&pat[1..], &name[i..])) + } + Some(b'?') => match name.first() { + Some(_) => segment_match(&pat[1..], &name[1..]), + None => false, + }, + Some(&head) => match name.first() { + Some(&c) if c == head => segment_match(&pat[1..], &name[1..]), + _ => false, + }, + } +} + +#[cfg(test)] +mod tests { + use super::glob_match; + + #[test] + fn double_star_across_segments() { + assert!(glob_match("src/auth/**", "src/auth/tokens/refresh.py")); + assert!(glob_match("src/auth/**", "src/auth/mod.py")); + assert!(glob_match("src/**", "src/auth/tokens/refresh.py")); + assert!(glob_match("**/refresh.py", "src/auth/refresh.py")); + } + + #[test] + fn single_star_stays_within_segment() { + assert!(glob_match("src/*.py", "src/main.py")); + assert!(!glob_match("src/*.py", "src/auth/main.py")); + assert!(glob_match("src/auth/*.py", "src/auth/tokens.py")); + } + + #[test] + fn rejects_non_matches() { + assert!(!glob_match("src/auth/**", "src/billing/tokens.py")); + assert!(!glob_match("src/auth/tokens.py", "src/auth/sessions.py")); + } + + #[test] + fn question_matches_single_char() { + assert!(glob_match("src/v?.py", "src/v1.py")); + assert!(!glob_match("src/v?.py", "src/v10.py")); + } +} diff --git a/crates/clarion-storage/src/guidance.rs b/crates/clarion-storage/src/guidance.rs new file mode 100644 index 00000000..d7115be2 --- /dev/null +++ b/crates/clarion-storage/src/guidance.rs @@ -0,0 +1,1185 @@ +//! Guidance-sheet write API (WS6 / REQ-GUIDANCE-01, REQ-GUIDANCE-03). +//! +//! Guidance sheets are entities with `kind = 'guidance'` and id +//! `core:guidance:`. They are operator-authored, have **no source file**, +//! and exist **outside any `clarion analyze` run** — so they must NOT go through +//! the run-scoped writer actor (`WriterCmd::InsertEntity`), which hard-requires +//! a `BeginRun` and a source-file anchor. Instead they insert via a plain, +//! non-run-scoped `INSERT INTO entities`, exactly the shape proven by the +//! storage schema test `entity_generated_columns_extract_from_properties_json`. +//! +//! The `properties` JSON this module writes is the contract the read path +//! (`clarion-mcp` `catalogue::inspection::tool_guidance_for` / `rule_match`) +//! consumes. In particular `match_rules` entries are `{"type": …, …}` objects: +//! - `{"type":"path","pattern":""}` +//! - `{"type":"tag","value":""}` +//! - `{"type":"kind","value":""}` +//! - `{"type":"subsystem","id":""}` +//! - `{"type":"entity","id":""}` +//! +//! Never set the generated columns (`scope_level`, `scope_rank`, +//! `git_churn_count`) directly — they extract from `properties` JSON via the +//! migration's `GENERATED ALWAYS AS` definitions; `scope_rank` is a CASE-mapped +//! VIRTUAL column (project→1 … function→6). + +use std::collections::HashSet; +use std::path::Path; + +use rusqlite::{Connection, OptionalExtension, params}; +use serde::{Deserialize, Serialize}; +use serde_json::{Value, json}; + +use crate::glob::glob_match; +use crate::query::{EntityRow, entity_by_id, subsystem_of_entity}; +use crate::{Result, StorageError}; + +/// The fully-resolved write payload for one guidance sheet. The caller (the CLI) +/// builds this from `--match` / `--scope-level` / `--content` / … and hands it +/// to [`upsert_guidance_sheet`]. `properties_json` is the verbatim object stored +/// in `entities.properties`; this module is the single place that knows the +/// column layout, but the caller owns the JSON shape so it can round-trip an +/// edited sheet without losing fields it does not understand. +pub struct GuidanceSheetInput<'a> { + /// Full entity id: `core:guidance:`. + pub id: &'a str, + /// `entities.name` — the canonical qualified name (segment 3 of the id). + pub name: &'a str, + /// `entities.short_name` — display tail of the name. + pub short_name: &'a str, + /// The complete `properties` JSON object (must include at least `content`, + /// `scope_level`, `provenance`, `authored_at`). + pub properties: &'a Value, +} + +/// A guidance sheet read back from storage. `properties` is the parsed +/// `entities.properties` object; `scope_rank` is the generated column so callers +/// can order without re-deriving the CASE map. +#[derive(Debug, Clone)] +pub struct GuidanceSheet { + pub id: String, + pub name: String, + pub short_name: String, + pub scope_level: Option, + pub scope_rank: Option, + pub properties: Value, + pub created_at: String, + pub updated_at: String, +} + +impl GuidanceSheet { + fn from_row(row: &rusqlite::Row) -> rusqlite::Result { + let properties_raw: String = row.get(4)?; + let properties = serde_json::from_str::(&properties_raw) + .unwrap_or_else(|_| json!({ "_raw": properties_raw })); + Ok(Self { + id: row.get(0)?, + name: row.get(1)?, + short_name: row.get(2)?, + scope_level: row.get::<_, Option>(3)?, + scope_rank: row.get::<_, Option>(5)?, + properties, + created_at: row.get(6)?, + updated_at: row.get(7)?, + }) + } + + /// `$.authored_at` from properties, used for tie-break ordering to mirror + /// the read path's `scope_rank ASC, authored_at ASC, id ASC`. + fn authored_at(&self) -> Option<&str> { + self.properties.get("authored_at").and_then(Value::as_str) + } + + /// `$.reviewed_at` from properties. Optional and not currently populated by + /// any write path, but honoured if present (an operator or a future + /// "mark reviewed" verb may set it). + fn reviewed_at(&self) -> Option<&str> { + self.properties.get("reviewed_at").and_then(Value::as_str) + } + + /// The instant this sheet was last "touched" for review-cadence purposes: + /// the later of `reviewed_at` and `authored_at` (lexical max — both are the + /// same fixed-width `YYYY-MM-DDTHH:MM:SS.mmmZ` shape, so byte order is + /// instant order). Returns `None` when neither is present. + fn touched_at(&self) -> Option<&str> { + match (self.reviewed_at(), self.authored_at()) { + (Some(r), Some(a)) => Some(r.max(a)), + (Some(r), None) => Some(r), + (None, Some(a)) => Some(a), + (None, None) => None, + } + } +} + +/// True if `sheet`'s `expires` instant is in the past relative to `now`. +/// +/// This is the **review-cadence/expiry** predicate that mirrors the MCP +/// `guidance_for` read path's expiry exclusion and the +/// `CLA-FACT-GUIDANCE-EXPIRED` finding: parse both values to Unix seconds, +/// accepting either `unix:` or RFC3339 timestamps, and compare +/// numerically. Fail open: a sheet with no `expires`, an unparseable `expires`, +/// or an unparseable clock is never hidden as expired. +#[must_use] +pub fn guidance_sheet_is_expired(sheet: &GuidanceSheet, now: &str) -> bool { + sheet + .properties + .get("expires") + .and_then(Value::as_str) + .and_then(parse_guidance_timestamp_to_unix_seconds) + .zip(parse_guidance_timestamp_to_unix_seconds(now)) + .is_some_and(|(expires, now)| expires < now) +} + +fn parse_guidance_timestamp_to_unix_seconds(value: &str) -> Option { + use time::OffsetDateTime; + use time::format_description::well_known::Rfc3339; + + if let Some(rest) = value.strip_prefix("unix:") { + return rest.trim().parse().ok(); + } + OffsetDateTime::parse(value, &Rfc3339) + .ok() + .map(OffsetDateTime::unix_timestamp) +} + +/// True if `sheet` has not been "touched" since `stale_before` — the +/// **age/review-cadence** staleness of system-design.md §7 line 741 +/// ("sheets not touched in N days"). "Touched" is the later of `reviewed_at` +/// and `authored_at`; the sheet is stale when that instant is strictly older +/// than `stale_before` (the caller's `now − N days` cutoff, in the same +/// fixed-width ISO-8601 shape so the compare is lexical). A sheet with neither +/// timestamp has no measurable age and is treated as **not stale**. +/// +/// NOTE: this is age-based staleness, distinct from the churn-based signal the +/// `CLA-FACT-GUIDANCE-CHURN-STALE` finding surfaces (which aggregates git churn +/// over matched entities). Do not conflate the two. +#[must_use] +pub fn guidance_sheet_is_stale(sheet: &GuidanceSheet, stale_before: &str) -> bool { + sheet + .touched_at() + .is_some_and(|touched| touched < stale_before) +} + +const SELECT_COLUMNS: &str = "id, name, short_name, scope_level, properties, \ + scope_rank, created_at, updated_at"; + +/// The reserved id prefix every guidance sheet's id must carry: `plugin_id` +/// `core`, reserved kind `guidance` (ADR-003 + ADR-022). The third segment (the +/// canonical name) follows. +const GUIDANCE_ID_PREFIX: &str = "core:guidance:"; + +/// Marker wrapping a Clarion guidance proposal inside a Filigree observation's +/// free-form detail field. The marker lets promotion parse only observations +/// that deliberately carry the guidance payload, rather than treating arbitrary +/// scratchpad prose as trusted sheet data. +pub const GUIDANCE_PROPOSAL_MARKER: &str = "BEGIN_CLARION_GUIDANCE_PROPOSAL_V1"; +const GUIDANCE_PROPOSAL_END_MARKER: &str = "END_CLARION_GUIDANCE_PROPOSAL_V1"; + +const PROVENANCE_FILIGREE_PROMOTION: &str = "filigree_promotion"; +const GUIDANCE_SCOPE_LEVELS: &[&str] = &[ + "project", + "subsystem", + "package", + "module", + "class", + "function", +]; + +/// The reviewed payload an MCP `propose_guidance` call stores in a Filigree +/// observation. A proposal is inert: until [`GuidanceProposal::to_promoted_sheet`] +/// is called by an operator-controlled promotion path and the resulting sheet is +/// written, it is not a `kind='guidance'` entity and cannot enter prompts. +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +pub struct GuidanceProposal { + pub entity_id: String, + pub content: String, + pub scope_level: String, + pub match_rules: Vec, + pub name: Option, + pub pinned: bool, + pub expires: Option, +} + +/// Fully-owned guidance sheet data produced from a promoted observation. +#[derive(Debug, Clone, PartialEq)] +pub struct PromotedGuidanceSheet { + pub id: String, + pub name: String, + pub short_name: String, + pub properties: Value, +} + +impl GuidanceProposal { + /// Build the default proposal shape for an entity-targeted suggestion. + #[must_use] + pub fn for_entity(entity_id: &str, content: &str) -> Self { + Self { + entity_id: entity_id.to_owned(), + content: content.to_owned(), + scope_level: "function".to_owned(), + match_rules: vec![json!({ "type": "entity", "id": entity_id })], + name: None, + pinned: false, + expires: None, + } + } + + /// Serialize the proposal into the observation detail envelope. + /// + /// # Errors + /// + /// Returns [`StorageError::InvalidQuery`] when JSON serialization fails. + pub fn to_observation_detail(&self) -> Result { + self.validate()?; + let json = serde_json::to_string_pretty(self) + .map_err(|e| StorageError::InvalidQuery(format!("serialize guidance proposal: {e}")))?; + Ok(format!( + "Clarion guidance proposal. Promote with `clarion guidance promote` after review.\n\n\ + {GUIDANCE_PROPOSAL_MARKER}\n{json}\n{GUIDANCE_PROPOSAL_END_MARKER}\n" + )) + } + + /// Parse a guidance proposal from a Filigree observation detail string. + /// + /// # Errors + /// + /// Returns [`StorageError::InvalidQuery`] when the marker is missing, JSON is + /// malformed, or the decoded proposal violates sheet invariants. + pub fn from_observation_detail(detail: &str) -> Result { + let start = detail.find(GUIDANCE_PROPOSAL_MARKER).ok_or_else(|| { + StorageError::InvalidQuery( + "observation does not contain a Clarion guidance proposal".to_owned(), + ) + })? + GUIDANCE_PROPOSAL_MARKER.len(); + let rest = &detail[start..]; + let end = rest.find(GUIDANCE_PROPOSAL_END_MARKER).ok_or_else(|| { + StorageError::InvalidQuery( + "Clarion guidance proposal is missing its end marker".to_owned(), + ) + })?; + let raw_json = rest[..end].trim(); + let proposal: Self = serde_json::from_str(raw_json).map_err(|e| { + StorageError::InvalidQuery(format!("parse Clarion guidance proposal: {e}")) + })?; + proposal.validate()?; + Ok(proposal) + } + + /// Convert this reviewed proposal into a guidance sheet payload. + /// + /// # Errors + /// + /// Returns [`StorageError::InvalidQuery`] if the proposal is malformed. + pub fn to_promoted_sheet(&self, authored_at: &str) -> Result { + self.validate()?; + let slug_source = self.name.as_deref().unwrap_or(&self.entity_id); + let name = slugify_guidance_name(slug_source); + let short_name = name.rsplit('.').next().unwrap_or(&name).to_owned(); + let id = format!("{GUIDANCE_ID_PREFIX}{name}"); + let mut properties = json!({ + "content": self.content, + "scope_level": self.scope_level, + "match_rules": self.match_rules, + "pinned": self.pinned, + "provenance": PROVENANCE_FILIGREE_PROMOTION, + "authored_at": authored_at, + "proposed_for_entity": self.entity_id, + }); + if let Some(expires) = &self.expires + && let Some(obj) = properties.as_object_mut() + { + obj.insert("expires".to_owned(), json!(expires)); + } + Ok(PromotedGuidanceSheet { + id, + name, + short_name, + properties, + }) + } + + fn validate(&self) -> Result<()> { + if self.entity_id.trim().is_empty() { + return Err(StorageError::InvalidQuery( + "guidance proposal missing entity_id".to_owned(), + )); + } + if self.content.trim().is_empty() { + return Err(StorageError::InvalidQuery( + "guidance proposal content is empty".to_owned(), + )); + } + if !GUIDANCE_SCOPE_LEVELS.contains(&self.scope_level.as_str()) { + return Err(StorageError::InvalidQuery(format!( + "guidance proposal scope_level '{}' is invalid", + self.scope_level + ))); + } + if self.match_rules.is_empty() { + return Err(StorageError::InvalidQuery( + "guidance proposal needs at least one match rule".to_owned(), + )); + } + Ok(()) + } +} + +/// Derive a canonical guidance-name slug. Kept here so CLI, MCP, and Wardline- +/// derived generation all mint ids with the same grammar. +#[must_use] +pub fn slugify_guidance_name(input: &str) -> String { + let mut out = String::with_capacity(input.len()); + let mut last_dash = false; + for ch in input.chars() { + if ch.is_ascii_alphanumeric() || matches!(ch, '.' | '-' | '_') { + out.push(ch); + last_dash = false; + } else if !last_dash { + out.push('-'); + last_dash = true; + } + } + let trimmed = out.trim_matches('-').to_owned(); + if trimmed.is_empty() { + "guidance".to_owned() + } else { + trimmed + } +} + +fn validate_guidance_id(id: &str) -> Result<()> { + if !id.starts_with(GUIDANCE_ID_PREFIX) { + return Err(StorageError::InvalidQuery(format!( + "guidance sheet id '{id}' is not a guidance id (must start with `{GUIDANCE_ID_PREFIX}`); \ + refusing to write — this would corrupt the entity it names" + ))); + } + Ok(()) +} + +/// Insert a new guidance sheet. Unlike [`upsert_guidance_sheet`], this is +/// create-only: an existing id is reported as an error and the stored row is +/// left unchanged. +/// +/// # Errors +/// +/// Returns [`StorageError::InvalidQuery`] if `sheet.id` does not start with +/// `core:guidance:` or if a row with the same id already exists. Returns +/// [`StorageError::Sqlite`] on any other `SQLite` failure. +pub fn insert_guidance_sheet(conn: &Connection, sheet: &GuidanceSheetInput<'_>) -> Result<()> { + validate_guidance_id(sheet.id)?; + let properties = serde_json::to_string(sheet.properties) + .map_err(|e| StorageError::InvalidQuery(format!("serialize guidance properties: {e}")))?; + let rows = conn.execute( + "INSERT INTO entities \ + (id, plugin_id, kind, name, short_name, properties, created_at, updated_at) \ + VALUES \ + (?1, 'core', 'guidance', ?2, ?3, ?4, \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now')) \ + ON CONFLICT(id) DO NOTHING", + params![sheet.id, sheet.name, sheet.short_name, properties], + )?; + if rows == 0 { + return Err(StorageError::InvalidQuery(format!( + "guidance sheet '{}' already exists; use edit to modify it", + sheet.id + ))); + } + Ok(()) +} + +/// Insert or replace a guidance sheet. On a fresh id this inserts; on an +/// existing id it updates `name`, `short_name`, `properties`, and bumps +/// `updated_at` (preserving `created_at`). The generated columns recompute from +/// the new `properties` automatically. +/// +/// This is the low-level overwrite primitive. The CLI's `create` guards against +/// clobbering an existing id (that is `edit`'s job); `edit` does a +/// read-modify-write that preserves `authored_at` / `provenance` / `pinned`. +/// +/// **Id guard (graph-integrity invariant):** the id MUST carry the +/// `core:guidance:` prefix. This protects ALL write paths +/// (create / edit / import) from a hand-edited or malicious payload whose id +/// names a *code* entity (e.g. `python:function:foo`): without the guard the +/// `ON CONFLICT(id) DO UPDATE` would overwrite that code entity's +/// `name`/`properties` (leaving its `kind`/`plugin_id`), silently corrupting the +/// entity graph. The `ON CONFLICT` clause is additionally scoped +/// `WHERE kind = 'guidance'` as defense-in-depth, but the prefix check is the +/// primary gate. +/// +/// # Errors +/// +/// Returns [`StorageError::InvalidQuery`] if `sheet.id` does not start with +/// `core:guidance:` (nothing is written). Returns [`StorageError::Sqlite`] on +/// any `SQLite` failure (lock, constraint). +pub fn upsert_guidance_sheet(conn: &Connection, sheet: &GuidanceSheetInput<'_>) -> Result<()> { + validate_guidance_id(sheet.id)?; + let properties = serde_json::to_string(sheet.properties) + .map_err(|e| StorageError::InvalidQuery(format!("serialize guidance properties: {e}")))?; + conn.execute( + "INSERT INTO entities \ + (id, plugin_id, kind, name, short_name, properties, created_at, updated_at) \ + VALUES \ + (?1, 'core', 'guidance', ?2, ?3, ?4, \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now')) \ + ON CONFLICT(id) DO UPDATE SET \ + name = excluded.name, \ + short_name = excluded.short_name, \ + properties = excluded.properties, \ + updated_at = strftime('%Y-%m-%dT%H:%M:%fZ','now') \ + WHERE kind = 'guidance'", + params![sheet.id, sheet.name, sheet.short_name, properties], + )?; + Ok(()) +} + +/// Upsert a [`PortableSheet`] (the import primitive, WS6 / T5). +/// +/// Additive by design: it `upsert`s the one sheet, re-deriving `short_name` from +/// `name`, and leaves every other sheet in the DB untouched. Import is therefore +/// a **merge**, never a mirror — it never deletes a local sheet absent from the +/// imported set (a mirror would be silent destruction of local knowledge). Re- +/// importing identical bytes is a no-op on content (only `updated_at` moves). +/// +/// # Errors +/// +/// Returns [`StorageError::Sqlite`] on any `SQLite` failure. +pub fn import_portable_sheet(conn: &Connection, sheet: &PortableSheet) -> Result<()> { + upsert_guidance_sheet( + conn, + &GuidanceSheetInput { + id: &sheet.id, + name: &sheet.name, + short_name: sheet.short_name(), + properties: &sheet.properties, + }, + ) +} + +/// Fetch one guidance sheet by id. Returns `None` if the id is absent or the +/// row exists but is not `kind = 'guidance'`. +/// +/// # Errors +/// +/// Returns [`StorageError::Sqlite`] on any `SQLite` failure. +pub fn get_guidance_sheet(conn: &Connection, id: &str) -> Result> { + let sql = format!("SELECT {SELECT_COLUMNS} FROM entities WHERE id = ?1 AND kind = 'guidance'"); + let sheet = conn + .query_row(&sql, params![id], GuidanceSheet::from_row) + .optional()?; + Ok(sheet) +} + +/// List guidance sheets, ordered to mirror the read path's composition sort: +/// `scope_rank ASC` (NULLs last), then `authored_at ASC`, then `id ASC`. So +/// CLI `list` output and `guidance_for` composition agree on ordering. +/// +/// # Errors +/// +/// Returns [`StorageError::Sqlite`] on any `SQLite` failure. +pub fn list_guidance_sheets(conn: &Connection) -> Result> { + let sql = format!("SELECT {SELECT_COLUMNS} FROM entities WHERE kind = 'guidance'"); + let mut stmt = conn.prepare(&sql)?; + let rows = stmt.query_map([], GuidanceSheet::from_row)?; + let mut sheets: Vec = rows.collect::>()?; + sheets.sort_by(|a, b| { + a.scope_rank + .unwrap_or(i64::MAX) + .cmp(&b.scope_rank.unwrap_or(i64::MAX)) + .then_with(|| a.authored_at().cmp(&b.authored_at())) + .then_with(|| a.id.cmp(&b.id)) + }); + Ok(sheets) +} + +/// Delete one guidance sheet by id. Returns `true` if a `kind = 'guidance'` row +/// was removed, `false` if no such sheet existed. +/// +/// # Errors +/// +/// Returns [`StorageError::Sqlite`] on any `SQLite` failure. +pub fn delete_guidance_sheet(conn: &Connection, id: &str) -> Result { + let affected = conn.execute( + "DELETE FROM entities WHERE id = ?1 AND kind = 'guidance'", + params![id], + )?; + Ok(affected > 0) +} + +// ── Portable (export/import) form (WS6 / T5, REQ-GUIDANCE-06) ────────────────── + +/// The git-shareable, diff-friendly form of one guidance sheet. +/// +/// A team commits these files to a repo to share institutional knowledge, so the +/// serialization is engineered for **determinism** (identical DB state → byte- +/// identical bytes) and **diff-friendliness** (a one-field change is a one-line +/// diff). It carries only the sheet's **portable** content: +/// - `id` — the full entity id (`core:guidance:`); preserved exactly. +/// - `name` — `entities.name` (segment 3 of the id). +/// - `properties` — the verbatim `entities.properties` object (`content`, +/// `scope_level`, `match_rules`, `pinned`, `provenance`, `authored_at`, …). +/// +/// Deliberately **omitted**: `created_at` / `updated_at`. Those are per-DB write +/// bookkeeping — they differ across machines and re-import, so exporting them +/// would inject spurious, non-deterministic diffs. `short_name` is also omitted: +/// it is re-derived on import from `name` exactly as the authoring path does, so +/// it can never drift from `create`'s convention. +/// +/// Determinism rests on `serde_json::Map` being a `BTreeMap` in this build (no +/// `preserve_order` feature), so [`Self::to_canonical_json`] emits map keys in +/// sorted order recursively; arrays (e.g. `match_rules`) keep author order, which +/// is the intended semantic. See [`Self::to_canonical_json`] for the byte contract. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct PortableSheet { + /// Full entity id (`core:guidance:`). + pub id: String, + /// `entities.name` — the canonical qualified name. + pub name: String, + /// The verbatim `entities.properties` object. + pub properties: Value, +} + +impl PortableSheet { + /// Project a stored [`GuidanceSheet`] down to its portable form, dropping the + /// per-DB `created_at` / `updated_at` bookkeeping and the re-derivable + /// `short_name`. + #[must_use] + pub fn from_sheet(sheet: &GuidanceSheet) -> Self { + Self { + id: sheet.id.clone(), + name: sheet.name.clone(), + properties: sheet.properties.clone(), + } + } + + /// The `short_name` to store on import: the display tail of `name`, derived + /// exactly as the CLI `create` path does (`name.rsplit('.').next()`), so an + /// imported sheet is byte-indistinguishable from a locally-authored one. + #[must_use] + pub fn short_name(&self) -> &str { + self.name.rsplit('.').next().unwrap_or(&self.name) + } + + /// Serialize to canonical, diff-friendly JSON **with a trailing newline**. + /// + /// "Canonical" = pretty-printed (one field per line, so a single changed + /// field is a single changed line) with **sorted** object keys at every + /// depth. Key order is sorted because `serde_json::Map` is a `BTreeMap` in + /// this build; `to_string_pretty` walks it in `BTreeMap` (sorted) order. The + /// struct's own three keys (`id`, `name`, `properties`) are likewise emitted + /// sorted — `id` < `name` < `properties` alphabetically, a stable order. The + /// trailing `\n` is POSIX-text hygiene and keeps git from flagging a + /// "no newline at end of file". + /// + /// This is the **only** place output bytes are formed; nothing on the export + /// path uses `HashMap` iteration order or embeds an export timestamp / path. + /// + /// # Errors + /// + /// Returns [`StorageError::InvalidQuery`] if serialization fails (it cannot, + /// for a `Value`-backed struct, but the fallible signature avoids a panic). + pub fn to_canonical_json(&self) -> Result { + let mut out = serde_json::to_string_pretty(self) + .map_err(|e| StorageError::InvalidQuery(format!("serialize guidance sheet: {e}")))?; + out.push('\n'); + Ok(out) + } + + /// Parse a [`PortableSheet`] from the canonical JSON bytes (`source` names the + /// file, for a loud error). Rejects an empty `id` or `name` — a sheet without + /// either cannot be upserted and signals a corrupt/hand-mangled file. + /// + /// # Errors + /// + /// Returns [`StorageError::InvalidQuery`] naming `source` on malformed JSON or + /// a missing/empty `id` / `name`. Import callers surface this as a hard + /// failure (a dropped sheet is silent data loss). + pub fn from_canonical_json(source: &str, bytes: &str) -> Result { + let sheet: Self = serde_json::from_str(bytes).map_err(|e| { + StorageError::InvalidQuery(format!("parse guidance sheet {source}: {e}")) + })?; + if sheet.id.trim().is_empty() { + return Err(StorageError::InvalidQuery(format!( + "guidance sheet {source}: missing or empty `id`" + ))); + } + if sheet.name.trim().is_empty() { + return Err(StorageError::InvalidQuery(format!( + "guidance sheet {source}: missing or empty `name`" + ))); + } + Ok(sheet) + } + + /// The deterministic, filesystem-safe filename for this sheet. + /// + /// The entity id contains colons (`core:guidance:foo.bar`), which are not + /// portable across filesystems (illegal on Windows/NTFS, awkward in shells). + /// We map each `:` to `__` (double underscore) and append `.json`, giving + /// e.g. `core__guidance__foo.bar.json`. The mapping is **deterministic** and + /// **collision-free**: `:` is a reserved id separator (ADR-003 entity ids are + /// exactly three colon-joined segments and the segments never contain a bare + /// colon by construction), so distinct ids never collide after substitution. + /// We do not need to reverse the filename — the authoritative id lives inside + /// the file — so the encoding only has to be injective, not invertible. + #[must_use] + pub fn file_name(&self) -> String { + format!("{}.json", self.id.replace(':', "__")) + } +} + +/// True if `sheet` applies to the entity `entity_id`, evaluating its +/// `match_rules` (path / tag / kind / subsystem / entity) against the entity's +/// facts. Uses the shared [`rule_match`] dispatch, so CLI `list --for-entity` +/// stays consistent with the MCP `guidance_for` read path on rule semantics. +/// +/// This considers **`match_rules` only** — it deliberately ignores explicit +/// `guides`-edge composition (which `guidance_for` *also* honours), so a `true` +/// here means "a match rule fired", not "`guidance_for` would compose this sheet". +/// `wardline_group` rules are not evaluable here and never match (the Wardline +/// blob is opaque to Clarion). +/// +/// `project_root` is needed to compute the entity's project-relative path for +/// `path` rules (the stored `source_file_path` is absolute). +/// +/// # Errors +/// +/// Returns [`StorageError::Sqlite`] on any `SQLite` failure resolving entity facts. +pub fn guidance_sheet_matches_entity( + conn: &Connection, + sheet: &GuidanceSheet, + entity_id: &str, + project_root: &Path, +) -> Result { + let Some(rules) = sheet + .properties + .get("match_rules") + .and_then(Value::as_array) + else { + return Ok(false); + }; + if rules.is_empty() { + return Ok(false); + } + let Some(facts) = MatchFacts::from_entity_id(conn, entity_id, project_root)? else { + return Ok(false); + }; + Ok(rules + .iter() + .any(|rule| matches!(rule_match(rule, &facts), RuleVerdict::Matched(_)))) +} + +/// Invalidate (delete) cached summaries for every entity `sheet` matches, +/// returning the number of `summary_cache` rows removed (WS6 / T-cache, +/// ADR-007). +/// +/// This is the eager-invalidation half of ADR-007's guidance contract: a +/// guidance-sheet edit changes the composed guidance, so the cached summaries of +/// every affected entity must be dropped or the new guidance never reaches a +/// future prompt (it would otherwise stay inert until the entity's *code* +/// changed and its `content_hash` cache key rotated). The CLI authoring path +/// (`clarion guidance create|edit|delete`) calls this on every mutation. +/// +/// Scan strategy: drive off `SELECT DISTINCT entity_id FROM summary_cache` (the +/// only entities that *can* be invalidated), not the whole entity table — this +/// keeps the work O(cached-entities) ≤ O(N-entities) and, by reusing +/// [`crate::cache::delete_summary_cache_for_entity`]'s single-entity `DELETE`, dodges the +/// `SQLite` 999-bound-parameter ceiling a broad `IN (…)` over a wide `path:` +/// match would otherwise hit on a large corpus. Guidance sheets never carry +/// cache rows, so the `kind = 'guidance'` exclusion is automatic. +/// +/// A sheet applies to an entity if EITHER a `match_rules` rule fires OR the sheet +/// has an explicit `guides` edge to it — the same OR composition the MCP +/// `guidance_for` read path uses. This function honours both: it collects the +/// sheet's `guides`-edge targets (`SELECT to_id FROM edges WHERE kind = 'guides' +/// AND from_id = ?sheet_id`) and invalidates them alongside the rule matches. +/// An entity reached by both a rule and a guides edge is invalidated exactly +/// once (the `cached_ids`-driven loop de-dups automatically). A sheet with no +/// `match_rules` and no `guides` edges matches nothing and this is a clean 0-row +/// no-op. +/// +/// `project_root` is required to evaluate `path:` rules (the stored +/// `source_file_path` is absolute; the matcher strips this prefix to a +/// project-relative path). It is canonicalized to align with symlink-resolved +/// stored paths, mirroring the CLI `list` path. +/// +/// # Errors +/// +/// Returns [`StorageError::Sqlite`] on any `SQLite` failure enumerating cached +/// entities, resolving entity facts, or deleting rows. +pub fn invalidate_summaries_for_sheet( + conn: &Connection, + sheet: &GuidanceSheet, + project_root: &Path, +) -> Result { + let canonical_root = project_root + .canonicalize() + .unwrap_or_else(|_| project_root.to_path_buf()); + + let cached_ids: Vec = { + let mut stmt = conn.prepare("SELECT DISTINCT entity_id FROM summary_cache")?; + let rows = stmt.query_map([], |row| row.get::<_, String>(0))?; + rows.collect::>()? + }; + + // The sheet's explicit `guides`-edge targets. `guidance_for` composes these + // OR-wise with `match_rules`, so invalidation must too. Driving the delete off + // `cached_ids` (below) with an OR'd predicate keeps the count exact and + // de-dups an entity reached by both a rule and a guides edge automatically. + let guides_targets: HashSet = { + let mut stmt = + conn.prepare("SELECT to_id FROM edges WHERE kind = 'guides' AND from_id = ?1")?; + let rows = stmt.query_map(params![sheet.id], |row| row.get::<_, String>(0))?; + rows.collect::>()? + }; + + let mut removed = 0usize; + for entity_id in &cached_ids { + if guides_targets.contains(entity_id) + || guidance_sheet_matches_entity(conn, sheet, entity_id, &canonical_root)? + { + removed += crate::cache::delete_summary_cache_for_entity(conn, entity_id)?; + } + } + Ok(removed) +} + +/// The minimum entity facts a guidance `match_rules` evaluation needs. This is +/// the single source of truth shared by the CLI write path +/// ([`guidance_sheet_matches_entity`]) and the MCP `guidance_for` read path — +/// the read path builds one from an already-loaded `EntityRow` +/// ([`MatchFacts::from_entity_row`]) to avoid a second lookup; the CLI resolves +/// by id ([`MatchFacts::from_entity_id`]). +pub struct MatchFacts { + kind: String, + rel_path: Option, + tags: HashSet, + subsystem_id: Option, + entity_id: String, +} + +impl MatchFacts { + /// Build facts from an already-loaded [`EntityRow`] (the read path has the + /// row in hand for the response, so it should not re-query). + /// + /// # Errors + /// + /// Returns [`StorageError::Sqlite`] on any `SQLite` failure loading tags or + /// the entity's subsystem. + pub fn from_entity_row( + conn: &Connection, + entity: &EntityRow, + project_root: &Path, + ) -> Result { + let rel_path = entity.source_file_path.as_ref().map(|path| { + Path::new(path) + .strip_prefix(project_root) + .ok() + .and_then(|rel| rel.to_str()) + .unwrap_or(path) + .to_owned() + }); + + let mut tags = HashSet::new(); + let mut stmt = conn.prepare("SELECT tag FROM entity_tags WHERE entity_id = ?1")?; + let mut rows = stmt.query(params![entity.id])?; + while let Some(row) = rows.next()? { + tags.insert(row.get::<_, String>(0)?); + } + + let subsystem_id = subsystem_of_entity(conn, &entity.id)?.map(|found| found.subsystem_id); + + Ok(Self { + kind: entity.kind.clone(), + rel_path, + tags, + subsystem_id, + entity_id: entity.id.clone(), + }) + } + + /// Resolve an entity by id, then build its facts. Returns `None` when the id + /// is unknown. + /// + /// # Errors + /// + /// Returns [`StorageError::Sqlite`] on any `SQLite` failure. + pub fn from_entity_id( + conn: &Connection, + entity_id: &str, + project_root: &Path, + ) -> Result> { + let Some(entity) = entity_by_id(conn, entity_id)? else { + return Ok(None); + }; + Ok(Some(Self::from_entity_row(conn, &entity, project_root)?)) + } +} + +/// The verdict of evaluating one guidance `match_rule` against an entity's +/// [`MatchFacts`]. The `Matched(&'static str)` label is load-bearing: the MCP +/// `guidance_for` read path surfaces it as the sheet's `matched_by` reason, and +/// `Unevaluable` drives its `wardline_group` skip signal. Do not rename the +/// labels. +pub enum RuleVerdict { + /// The rule matched; the static label is the rule-type name (`"path"`, + /// `"tag"`, `"kind"`, `"subsystem"`, `"entity"`). + Matched(&'static str), + /// The rule did not match (or was malformed). + NoMatch, + /// The rule cannot be evaluated against static facts (`wardline_group`, + /// which would require parsing the opaque Wardline blob). + Unevaluable, +} + +/// Evaluate one guidance `match_rule` (a `{"type": …, …}` object) against an +/// entity's [`MatchFacts`]. The single shared dispatch behind both +/// `guidance_sheet_matches_entity` (CLI) and `tool_guidance_for` (MCP), so the +/// two surfaces cannot drift on rule semantics. +#[must_use] +pub fn rule_match(rule: &Value, facts: &MatchFacts) -> RuleVerdict { + let Some(rule_type) = rule.get("type").and_then(Value::as_str) else { + return RuleVerdict::NoMatch; + }; + match rule_type { + "path" => match ( + rule.get("pattern").and_then(Value::as_str), + facts.rel_path.as_deref(), + ) { + (Some(pattern), Some(path)) if glob_match(pattern, path) => { + RuleVerdict::Matched("path") + } + _ => RuleVerdict::NoMatch, + }, + "tag" => match rule.get("value").and_then(Value::as_str) { + Some(value) if facts.tags.contains(value) => RuleVerdict::Matched("tag"), + _ => RuleVerdict::NoMatch, + }, + "kind" => match rule.get("value").and_then(Value::as_str) { + Some(value) if value == facts.kind => RuleVerdict::Matched("kind"), + _ => RuleVerdict::NoMatch, + }, + "subsystem" => match ( + rule.get("id").and_then(Value::as_str), + facts.subsystem_id.as_deref(), + ) { + (Some(id), Some(sub)) if id == sub => RuleVerdict::Matched("subsystem"), + _ => RuleVerdict::NoMatch, + }, + "entity" => match rule.get("id").and_then(Value::as_str) { + Some(id) if id == facts.entity_id => RuleVerdict::Matched("entity"), + _ => RuleVerdict::NoMatch, + }, + "wardline_group" => RuleVerdict::Unevaluable, + _ => RuleVerdict::NoMatch, + } +} + +#[cfg(test)] +mod tests { + use super::*; + + /// Build a bare `GuidanceSheet` carrying only the given properties object, + /// for testing the pure date predicates (no DB needed). + fn sheet_with(properties: Value) -> GuidanceSheet { + GuidanceSheet { + id: "core:guidance:test".to_owned(), + name: "test".to_owned(), + short_name: "test".to_owned(), + scope_level: Some("module".to_owned()), + scope_rank: Some(4), + properties, + created_at: "2026-01-01T00:00:00.000Z".to_owned(), + updated_at: "2026-01-01T00:00:00.000Z".to_owned(), + } + } + + #[test] + fn guidance_proposal_detail_round_trips_to_promoted_sheet() { + let proposal = GuidanceProposal { + entity_id: "python:function:demo.entry".to_owned(), + content: "Prefer operational risk notes.".to_owned(), + scope_level: "function".to_owned(), + match_rules: vec![json!({"type": "entity", "id": "python:function:demo.entry"})], + name: Some("demo-entry-risk".to_owned()), + pinned: true, + expires: Some("2026-12-31T00:00:00.000Z".to_owned()), + }; + + let detail = proposal + .to_observation_detail() + .expect("serialize proposal detail"); + assert!(detail.contains(GUIDANCE_PROPOSAL_MARKER)); + + let parsed = GuidanceProposal::from_observation_detail(&detail) + .expect("parse proposal from observation detail"); + assert_eq!(parsed, proposal); + + let sheet = parsed + .to_promoted_sheet("2026-06-04T00:00:00.000Z") + .expect("build promoted sheet"); + assert_eq!(sheet.id, "core:guidance:demo-entry-risk"); + assert_eq!(sheet.name, "demo-entry-risk"); + assert_eq!(sheet.short_name, "demo-entry-risk"); + assert_eq!( + sheet.properties.get("provenance").and_then(Value::as_str), + Some("filigree_promotion") + ); + assert_eq!( + sheet.properties.get("authored_at").and_then(Value::as_str), + Some("2026-06-04T00:00:00.000Z") + ); + assert_eq!( + sheet + .properties + .get("match_rules") + .and_then(Value::as_array) + .and_then(|rules| rules.first()) + .and_then(|rule| rule.get("id")) + .and_then(Value::as_str), + Some("python:function:demo.entry") + ); + } + + // ── guidance_sheet_is_expired ──────────────────────────────────────────── + + #[test] + fn expired_past_expires_is_expired() { + let sheet = sheet_with(json!({ "expires": "2026-01-01T00:00:00.000Z" })); + assert!(guidance_sheet_is_expired( + &sheet, + "2026-06-03T12:00:00.000Z" + )); + } + + #[test] + fn expired_future_expires_is_not_expired() { + let sheet = sheet_with(json!({ "expires": "2999-01-01T00:00:00.000Z" })); + assert!(!guidance_sheet_is_expired( + &sheet, + "2026-06-03T12:00:00.000Z" + )); + } + + #[test] + fn expired_absent_expires_is_not_expired() { + let sheet = sheet_with(json!({ "authored_at": "2026-01-01T00:00:00.000Z" })); + assert!(!guidance_sheet_is_expired( + &sheet, + "2026-06-03T12:00:00.000Z" + )); + } + + #[test] + fn expired_equal_expires_is_not_expired() { + // `expires < now` is strict: a sheet expiring exactly at `now` is not + // yet expired (mirrors the read path's `<` compare). + let sheet = sheet_with(json!({ "expires": "2026-06-03T12:00:00.000Z" })); + assert!(!guidance_sheet_is_expired( + &sheet, + "2026-06-03T12:00:00.000Z" + )); + } + + #[test] + fn expired_future_expires_is_not_expired_with_unix_clock() { + let sheet = sheet_with(json!({ "expires": "2999-01-01T00:00:00.000Z" })); + assert!(!guidance_sheet_is_expired(&sheet, "unix:1748822400")); + } + + #[test] + fn expired_past_expires_is_expired_with_unix_clock() { + let sheet = sheet_with(json!({ "expires": "2000-01-01T00:00:00.000Z" })); + assert!(guidance_sheet_is_expired(&sheet, "unix:1748822400")); + } + + #[test] + fn expired_unparseable_clock_fails_open() { + let sheet = sheet_with(json!({ "expires": "2000-01-01T00:00:00.000Z" })); + assert!(!guidance_sheet_is_expired(&sheet, "not-a-clock")); + } + + // ── guidance_sheet_is_stale ────────────────────────────────────────────── + + #[test] + fn stale_old_authored_is_stale() { + // authored long ago, no reviewed_at → touched = authored < cutoff. + let sheet = sheet_with(json!({ "authored_at": "2026-01-01T00:00:00.000Z" })); + let cutoff = "2026-03-05T12:00:00.000Z"; // now − 90 days, roughly + assert!(guidance_sheet_is_stale(&sheet, cutoff)); + } + + #[test] + fn stale_fresh_authored_is_not_stale() { + let sheet = sheet_with(json!({ "authored_at": "2026-06-01T00:00:00.000Z" })); + let cutoff = "2026-03-05T12:00:00.000Z"; + assert!(!guidance_sheet_is_stale(&sheet, cutoff)); + } + + #[test] + fn stale_recent_reviewed_at_overrides_old_authored_at() { + // Old authored_at but a recent reviewed_at → touched = max = reviewed_at, + // which is after the cutoff, so the sheet is NOT stale. This is the named + // TDD target: reviewed_at (when later) is what counts. + let sheet = sheet_with(json!({ + "authored_at": "2025-01-01T00:00:00.000Z", + "reviewed_at": "2026-06-01T00:00:00.000Z", + })); + let cutoff = "2026-03-05T12:00:00.000Z"; + assert!(!guidance_sheet_is_stale(&sheet, cutoff)); + } + + #[test] + fn stale_old_reviewed_at_still_stale() { + // Both old → touched = max is still before the cutoff → stale. + let sheet = sheet_with(json!({ + "authored_at": "2025-01-01T00:00:00.000Z", + "reviewed_at": "2025-02-01T00:00:00.000Z", + })); + let cutoff = "2026-03-05T12:00:00.000Z"; + assert!(guidance_sheet_is_stale(&sheet, cutoff)); + } + + #[test] + fn stale_no_timestamps_is_not_stale() { + // Neither authored_at nor reviewed_at → unmeasurable age → not stale. + let sheet = sheet_with(json!({ "content": "x" })); + let cutoff = "2026-03-05T12:00:00.000Z"; + assert!(!guidance_sheet_is_stale(&sheet, cutoff)); + } + + #[test] + fn stale_equal_to_cutoff_is_not_stale() { + // `touched < stale_before` is strict. + let sheet = sheet_with(json!({ "authored_at": "2026-03-05T12:00:00.000Z" })); + let cutoff = "2026-03-05T12:00:00.000Z"; + assert!(!guidance_sheet_is_stale(&sheet, cutoff)); + } + + // ── PortableSheet (export/import) ───────────────────────────────────────── + + fn portable_with(id: &str, name: &str, properties: Value) -> PortableSheet { + PortableSheet { + id: id.to_owned(), + name: name.to_owned(), + properties, + } + } + + #[test] + fn canonical_json_has_trailing_newline() { + let p = portable_with("core:guidance:x", "x", json!({ "content": "y" })); + let json = p.to_canonical_json().unwrap(); + assert!(json.ends_with('\n'), "must end with a newline: {json:?}"); + assert!(!json.ends_with("\n\n"), "exactly one newline: {json:?}"); + } + + #[test] + fn canonical_json_sorts_keys_for_diff_stability() { + // Author the properties with keys in NON-sorted order; the serialized + // bytes must come out sorted (so a re-serialize from any key order is + // byte-stable). `serde_json::Map` is a BTreeMap in this build, so this + // holds recursively. + let p = portable_with( + "core:guidance:s", + "s", + json!({ "zeta": 1, "alpha": 2, "nested": { "yray": 1, "beta": 2 } }), + ); + let json = p.to_canonical_json().unwrap(); + let alpha = json.find("alpha").unwrap(); + let zeta = json.find("zeta").unwrap(); + assert!(alpha < zeta, "top-level keys sorted: {json}"); + let beta = json.find("beta").unwrap(); + let yray = json.find("yray").unwrap(); + assert!(beta < yray, "nested keys sorted: {json}"); + } + + #[test] + fn canonical_json_is_deterministic_across_runs() { + // Two PortableSheets built from differently-ordered property maps but the + // same logical content must serialize byte-identically. + let a = portable_with( + "core:guidance:d", + "d", + json!({ "b": 1, "a": 2, "c": [3, 2, 1] }), + ); + let b = portable_with( + "core:guidance:d", + "d", + json!({ "c": [3, 2, 1], "a": 2, "b": 1 }), + ); + assert_eq!( + a.to_canonical_json().unwrap(), + b.to_canonical_json().unwrap() + ); + } + + #[test] + fn canonical_json_preserves_array_order() { + // match_rules order is semantic (first-match precedence) — arrays must NOT + // be reordered, only object keys. + let p = portable_with( + "core:guidance:r", + "r", + json!({ "match_rules": [{ "type": "path" }, { "type": "kind" }] }), + ); + let json = p.to_canonical_json().unwrap(); + assert!( + json.find("path").unwrap() < json.find("kind").unwrap(), + "array element order preserved: {json}" + ); + } + + #[test] + fn portable_json_round_trips() { + let p = portable_with( + "core:guidance:rt", + "auth.tokens", + json!({ + "content": "guard the refresh path", + "scope_level": "module", + "match_rules": [{ "type": "path", "pattern": "src/auth/**" }], + "pinned": true, + "provenance": "manual", + "authored_at": "2026-01-01T00:00:00.000Z", + "expires": "2027-01-01T00:00:00.000Z", + }), + ); + let json = p.to_canonical_json().unwrap(); + let back = PortableSheet::from_canonical_json("rt.json", &json).unwrap(); + assert_eq!(back.id, p.id); + assert_eq!(back.name, p.name); + assert_eq!(back.properties, p.properties); + } + + #[test] + fn file_name_sanitizes_colons() { + let p = portable_with("core:guidance:foo.bar", "foo.bar", json!({})); + assert_eq!(p.file_name(), "core__guidance__foo.bar.json"); + } + + #[test] + fn short_name_is_display_tail() { + let p = portable_with("core:guidance:a.b.c", "a.b.c", json!({})); + assert_eq!(p.short_name(), "c"); + let flat = portable_with("core:guidance:flat", "flat", json!({})); + assert_eq!(flat.short_name(), "flat"); + } + + #[test] + fn from_canonical_json_rejects_malformed() { + assert!(PortableSheet::from_canonical_json("bad.json", "{ not json").is_err()); + // valid JSON but not a sheet (missing id/name) → error naming the file. + let err = PortableSheet::from_canonical_json("nmeta.json", "{\"properties\": {}}") + .unwrap_err() + .to_string(); + assert!(err.contains("nmeta.json"), "error names the file: {err}"); + } + + #[test] + fn from_canonical_json_rejects_empty_id() { + let err = PortableSheet::from_canonical_json("empty.json", "{\"id\":\"\",\"name\":\"n\"}") + .unwrap_err() + .to_string(); + assert!(err.contains("empty.json"), "{err}"); + } +} diff --git a/crates/clarion-storage/src/lib.rs b/crates/clarion-storage/src/lib.rs index 1ebc4d1e..ba13d495 100644 --- a/crates/clarion-storage/src/lib.rs +++ b/crates/clarion-storage/src/lib.rs @@ -8,11 +8,14 @@ pub mod cache; pub mod commands; pub mod embeddings; pub mod error; +pub mod glob; +pub mod guidance; pub mod pragma; pub mod prior_index; pub mod query; pub mod reader; pub mod retry; +pub mod runs; pub mod schema; pub mod sei; pub mod unresolved; @@ -31,6 +34,14 @@ pub use commands::{ }; pub use embeddings::{EmbeddingKey, EmbeddingStore, StoredEmbedding, embeddings_db_path}; pub use error::{Result, StorageError}; +pub use glob::glob_match; +pub use guidance::{ + GUIDANCE_PROPOSAL_MARKER, GuidanceProposal, GuidanceSheet, GuidanceSheetInput, MatchFacts, + PortableSheet, PromotedGuidanceSheet, RuleVerdict, delete_guidance_sheet, get_guidance_sheet, + guidance_sheet_is_expired, guidance_sheet_is_stale, guidance_sheet_matches_entity, + import_portable_sheet, insert_guidance_sheet, invalidate_summaries_for_sheet, + list_guidance_sheets, rule_match, slugify_guidance_name, upsert_guidance_sheet, +}; pub use prior_index::{ PriorIndexEntry, clear_prior_index, load_prior_index, previously_analyzed_files, prior_locators_by_file, replace_prior_index, upsert_prior_index_entry, @@ -51,6 +62,7 @@ pub use query::{ }; pub use reader::ReaderPool; pub use retry::{RetryPolicy, begin_immediate}; +pub use runs::mark_stale_running_runs_failed; pub use sei::{ BindingStatus, GitRename, GitRenameSource, LineageEvent, NewEntityDescriptor, SEI_PREFIX, SeiBinding, SeiBindingRecord, SeiDecision, SeiLineageEntry, SeiLineageRow, SeiLookupResult, diff --git a/crates/clarion-storage/src/query.rs b/crates/clarion-storage/src/query.rs index 9e372689..38791481 100644 --- a/crates/clarion-storage/src/query.rs +++ b/crates/clarion-storage/src/query.rs @@ -967,11 +967,13 @@ pub fn unresolved_call_sites_for_caller( StorageError::InvalidQuery("unresolved call-site limit is too large".to_owned()) })?; let mut stmt = conn.prepare( - "SELECT caller_entity_id, caller_content_hash, site_key, site_ordinal, \ - source_file_id, source_byte_start, source_byte_end, callee_expr \ - FROM entity_unresolved_call_sites \ - WHERE caller_entity_id = ?1 \ - ORDER BY site_ordinal, site_key \ + "SELECT u.caller_entity_id, u.caller_content_hash, u.site_key, u.site_ordinal, \ + u.source_file_id, u.source_byte_start, u.source_byte_end, u.callee_expr \ + FROM entity_unresolved_call_sites u \ + JOIN entities caller ON caller.id = u.caller_entity_id \ + WHERE u.caller_entity_id = ?1 \ + AND caller.content_hash = u.caller_content_hash \ + ORDER BY u.site_ordinal, u.site_key \ LIMIT ?2", )?; let rows = stmt.query_map(params![caller_id, limit_i64], map_unresolved_call_site_row)?; @@ -998,9 +1000,10 @@ pub fn unresolved_callers_for_target( u.source_file_id, u.source_byte_start, u.source_byte_end, u.callee_expr \ FROM entity_unresolved_call_sites u \ JOIN entities caller ON caller.id = u.caller_entity_id \ - WHERE u.callee_expr = ?1 \ - OR u.callee_expr = ?2 \ - OR u.callee_expr LIKE ?3 ESCAPE '\\' \ + WHERE caller.content_hash = u.caller_content_hash \ + AND (u.callee_expr = ?1 \ + OR u.callee_expr = ?2 \ + OR u.callee_expr LIKE ?3 ESCAPE '\\') \ ORDER BY CASE WHEN caller.source_file_id = ?4 THEN 0 ELSE 1 END, \ u.caller_entity_id, u.site_ordinal, u.site_key \ LIMIT ?5", diff --git a/crates/clarion-storage/src/runs.rs b/crates/clarion-storage/src/runs.rs new file mode 100644 index 00000000..d1b996e4 --- /dev/null +++ b/crates/clarion-storage/src/runs.rs @@ -0,0 +1,43 @@ +//! Run-lifecycle repair helpers. + +use rusqlite::{Connection, params}; + +use crate::Result; + +/// Running rows older than this heartbeat window are considered abandoned. +/// +/// The value is deliberately conservative: normal analyze runs should refresh +/// `heartbeat_at` at run open/resume and at writer batch boundaries. A 24-hour +/// gap is far beyond expected local analyze duration while still preventing +/// dead rows from poisoning status forever. +const STALE_RUNNING_HEARTBEAT_SQL: &str = "-24 hours"; + +/// Mark stale `running` rows as failed. +/// +/// This is idempotent and safe to call from analyze startup or diagnostic read +/// paths. It uses the heartbeat rather than probing `owner_pid` so behavior is +/// portable across Unix/macOS/Windows and testable without process tricks. +/// +/// # Errors +/// +/// Returns `SQLite` errors from the underlying `UPDATE`. +pub fn mark_stale_running_runs_failed(conn: &Connection) -> Result { + let failure_stats = serde_json::json!({ + "failure_reason": "analyze run abandoned: stale heartbeat", + }) + .to_string(); + let changed = conn.execute( + "UPDATE runs \ + SET status = 'failed', \ + completed_at = strftime('%Y-%m-%dT%H:%M:%fZ', 'now'), \ + stats = ?1, \ + owner_pid = NULL \ + WHERE status = 'running' \ + AND ( \ + heartbeat_at IS NULL \ + OR heartbeat_at < strftime('%Y-%m-%dT%H:%M:%fZ', 'now', ?2) \ + )", + params![failure_stats, STALE_RUNNING_HEARTBEAT_SQL], + )?; + Ok(changed) +} diff --git a/crates/clarion-storage/src/schema.rs b/crates/clarion-storage/src/schema.rs index bb3ccb29..1aaca227 100644 --- a/crates/clarion-storage/src/schema.rs +++ b/crates/clarion-storage/src/schema.rs @@ -50,12 +50,17 @@ const MIGRATIONS: &[Migration] = &[ name: "0007_run_analyzed_commit", sql: include_str!("../migrations/0007_run_analyzed_commit.sql"), }, + Migration { + version: 8, + name: "0008_run_owner_heartbeat", + sql: include_str!("../migrations/0008_run_owner_heartbeat.sql"), + }, ]; /// Highest migration version known to this build. Mirrored into the /// `SQLite` `user_version` header (STO-02) so a future-built database is /// refused at open instead of silently corrupting state. -pub const CURRENT_SCHEMA_VERSION: u32 = 7; +pub const CURRENT_SCHEMA_VERSION: u32 = 8; const _CURRENT_SCHEMA_VERSION_MATCHES_LAST_MIGRATION: () = { // Compile-time check: `CURRENT_SCHEMA_VERSION` must equal the highest diff --git a/crates/clarion-storage/src/wardline_taint.rs b/crates/clarion-storage/src/wardline_taint.rs index 80ff96f2..69501b9a 100644 --- a/crates/clarion-storage/src/wardline_taint.rs +++ b/crates/clarion-storage/src/wardline_taint.rs @@ -343,35 +343,51 @@ mod tests { } } + fn wardline_qualname_fixture() -> serde_json::Value { + serde_json::from_str(include_str!( + "../../../docs/federation/fixtures/wardline-qualname-normalization.json" + )) + .expect("parse wardline qualname fixture") + } + #[test] fn resolves_fixture_vectors_exact() { let conn = migrated_conn(); - // expected_entity_id values copied verbatim from - // fixtures/wardline-qualname-normalization.json qualified_name_vectors. - seed( - &conn, - &[ - "python:function:auth.tokens.TokenManager.verify", - "python:function:auth.tokens.refresh..helper", - "python:function:pkg.sub.mod.Outer.Inner.method", - "python:function:lib.foo.Service.handle", - "python:function:myns.pkg.mod.widget", - ], - ); - for qualname in [ - "auth.tokens.TokenManager.verify", - "auth.tokens.refresh..helper", - "pkg.sub.mod.Outer.Inner.method", - "lib.foo.Service.handle", - "myns.pkg.mod.widget", - ] { + let fixture = wardline_qualname_fixture(); + let vectors = fixture["qualified_name_vectors"] + .as_array() + .expect("qualified_name_vectors array"); + + for vector in vectors + .iter() + .filter(|vector| vector["kind"].as_str() == Some("function")) + { + insert_entity( + &conn, + vector["expected_entity_id"] + .as_str() + .expect("expected_entity_id string"), + None, + ); + } + for vector in vectors + .iter() + .filter(|vector| vector["kind"].as_str() == Some("function")) + { + let qualname = vector["expected_qualified_name"] + .as_str() + .expect("expected_qualified_name string"); + let expected_entity_id = vector["expected_entity_id"] + .as_str() + .expect("expected_entity_id string"); let r = resolve_wardline_qualname(&conn, qualname).unwrap(); assert_eq!( r, Resolution::Exact { - entity_id: format!("python:function:{qualname}"), + entity_id: expected_entity_id.to_owned(), }, - "{qualname}" + "{}", + vector["description"].as_str().unwrap_or(qualname) ); } } diff --git a/crates/clarion-storage/src/writer.rs b/crates/clarion-storage/src/writer.rs index c391b2ba..8a32bb72 100644 --- a/crates/clarion-storage/src/writer.rs +++ b/crates/clarion-storage/src/writer.rs @@ -355,7 +355,8 @@ fn cleanup_after_channel_close(conn: &mut Connection, state: &mut ActorState) { let _ = conn.execute( "UPDATE runs SET status = 'failed', \ completed_at = strftime('%Y-%m-%dT%H:%M:%fZ', 'now'), \ - stats = ?1 \ + stats = ?1, \ + owner_pid = NULL \ WHERE id = ?2", params![stats_json, run_id], ); @@ -411,6 +412,10 @@ fn begin_write_tx(conn: &Connection, state: &ActorState) -> Result<()> { crate::retry::begin_immediate(conn, &state.retry_policy) } +fn owner_pid() -> i64 { + i64::from(std::process::id()) +} + fn begin_run( conn: &mut Connection, state: &mut ActorState, @@ -425,9 +430,11 @@ fn begin_run( )); } conn.execute( - "INSERT INTO runs (id, started_at, completed_at, config, stats, status, analyzed_at_commit) \ - VALUES (?1, ?2, NULL, ?3, '{}', 'running', ?4)", - params![run_id, started_at, config_json, head_commit], + "INSERT INTO runs ( \ + id, started_at, completed_at, config, stats, status, analyzed_at_commit, \ + owner_pid, heartbeat_at \ + ) VALUES (?1, ?2, NULL, ?3, '{}', 'running', ?4, ?5, ?2)", + params![run_id, started_at, config_json, head_commit, owner_pid()], )?; begin_write_tx(conn, state)?; state.in_tx = true; @@ -452,8 +459,13 @@ fn resume_run(conn: &mut Connection, state: &mut ActorState, run_id: &str) -> Re )); } let reopened = conn.execute( - "UPDATE runs SET status = 'running', completed_at = NULL WHERE id = ?1", - params![run_id], + "UPDATE runs \ + SET status = 'running', \ + completed_at = NULL, \ + owner_pid = ?1, \ + heartbeat_at = strftime('%Y-%m-%dT%H:%M:%fZ', 'now') \ + WHERE id = ?2", + params![owner_pid(), run_id], )?; if reopened == 0 { return Err(StorageError::WriterProtocol(format!( @@ -549,6 +561,16 @@ fn insert_entity( entity.updated_at, ], )?; + conn.execute( + "DELETE FROM entity_tags WHERE entity_id = ?1 AND plugin_id = ?2", + params![entity.id, entity.plugin_id], + )?; + for tag in &entity.tags { + conn.execute( + "INSERT OR IGNORE INTO entity_tags (entity_id, plugin_id, tag) VALUES (?1, ?2, ?3)", + params![entity.id, entity.plugin_id, tag], + )?; + } bump_writes_and_maybe_commit(conn, state, commits_observed)?; Ok(()) } @@ -568,9 +590,10 @@ fn enforce_entity_kind_contract(entity: &EntityRecord) -> Result<()> { Ok(()) } -// B.6 stores module ids as source anchors until core-minted `file` entities -// land; keep both accepted so the storage contract survives that handoff. -const SOURCE_FILE_ANCHOR_KINDS: &[&str] = &["file", "module"]; +// Core-minted file entities are the single canonical source anchor. Module +// entities live below the file in the parent/contains chain, but may not stand +// in for the file-level identity. +const SOURCE_FILE_ANCHOR_KINDS: &[&str] = &["file"]; fn validate_source_file_anchor( conn: &Connection, @@ -980,6 +1003,7 @@ fn bump_writes_and_maybe_commit( if state.writes_in_batch >= state.batch_size { state.writes_in_batch = 0; state.in_tx = false; + refresh_current_run_heartbeat(conn, state)?; conn.execute_batch("COMMIT")?; commits_observed.fetch_add(1, Ordering::Relaxed); // Open the next batch eagerly so the next write doesn't pay @@ -990,6 +1014,19 @@ fn bump_writes_and_maybe_commit( Ok(()) } +fn refresh_current_run_heartbeat(conn: &Connection, state: &ActorState) -> Result<()> { + let Some(run_id) = state.current_run.as_deref() else { + return Ok(()); + }; + conn.execute( + "UPDATE runs \ + SET heartbeat_at = strftime('%Y-%m-%dT%H:%M:%fZ', 'now'), owner_pid = ?1 \ + WHERE id = ?2", + params![owner_pid(), run_id], + )?; + Ok(()) +} + fn flush_run_batch( conn: &mut Connection, state: &mut ActorState, @@ -1011,6 +1048,7 @@ fn flush_run_batch( if state.in_tx { state.in_tx = false; state.writes_in_batch = 0; + refresh_current_run_heartbeat(conn, state)?; conn.execute_batch("COMMIT")?; commits_observed.fetch_add(1, Ordering::Relaxed); } @@ -1029,6 +1067,7 @@ fn query_time_write( if state.in_tx { state.in_tx = false; state.writes_in_batch = 0; + refresh_current_run_heartbeat(conn, state)?; conn.execute_batch("COMMIT")?; commits_observed.fetch_add(1, Ordering::Relaxed); } @@ -1072,8 +1111,9 @@ fn commit_run( }) .to_string(); let changed = conn.execute( - "UPDATE runs SET status = 'failed', completed_at = ?1, stats = ?2 \ - WHERE id = ?3", + "UPDATE runs \ + SET status = 'failed', completed_at = ?1, stats = ?2, owner_pid = NULL \ + WHERE id = ?3", params![completed_at, failure_stats, run_id], )?; if let Err(err) = ensure_run_update_changed_one(changed, run_id) { @@ -1084,7 +1124,9 @@ fn commit_run( return Err(StorageError::WriterProtocol(mismatch)); } let changed = conn.execute( - "UPDATE runs SET status = ?1, completed_at = ?2, stats = ?3 WHERE id = ?4", + "UPDATE runs \ + SET status = ?1, completed_at = ?2, stats = ?3, owner_pid = NULL \ + WHERE id = ?4", params![status.as_str(), completed_at, stats_json, run_id], )?; if let Err(err) = ensure_run_update_changed_one(changed, run_id) { @@ -1104,7 +1146,9 @@ fn commit_run( // staged-and-not-committed, so the parent-id check has nothing to // catch that would change the durable state. let changed = conn.execute( - "UPDATE runs SET status = ?1, completed_at = ?2, stats = ?3 WHERE id = ?4", + "UPDATE runs \ + SET status = ?1, completed_at = ?2, stats = ?3, owner_pid = NULL \ + WHERE id = ?4", params![status.as_str(), completed_at, stats_json, run_id], )?; if let Err(err) = ensure_run_update_changed_one(changed, run_id) { @@ -1201,7 +1245,9 @@ fn fail_run( } let stats_json = serde_json::json!({ "failure_reason": reason }).to_string(); let changed = conn.execute( - "UPDATE runs SET status = 'failed', completed_at = ?1, stats = ?2 WHERE id = ?3", + "UPDATE runs \ + SET status = 'failed', completed_at = ?1, stats = ?2, owner_pid = NULL \ + WHERE id = ?3", params![completed_at, stats_json, run_id], )?; if let Err(err) = ensure_run_update_changed_one(changed, run_id) { diff --git a/crates/clarion-storage/tests/guidance_write.rs b/crates/clarion-storage/tests/guidance_write.rs new file mode 100644 index 00000000..377004cf --- /dev/null +++ b/crates/clarion-storage/tests/guidance_write.rs @@ -0,0 +1,619 @@ +//! Guidance-sheet write-API integration tests (WS6 / REQ-GUIDANCE-01, +//! REQ-GUIDANCE-03). +//! +//! Exercises `upsert_guidance_sheet` / `get_guidance_sheet` / +//! `list_guidance_sheets` / `delete_guidance_sheet` against a fresh schema, +//! plus the `--for-entity` matcher. The headline assertion is the explicit TDD +//! target: sheets written with various `scope_level`s come back ordered by the +//! generated `scope_rank` column (project < subsystem < … < function). + +use rusqlite::{Connection, params}; +use serde_json::{Value, json}; + +use clarion_storage::{ + GuidanceSheetInput, delete_guidance_sheet, get_guidance_sheet, guidance_sheet_matches_entity, + insert_guidance_sheet, list_guidance_sheets, pragma, schema, upsert_guidance_sheet, +}; + +fn open_fresh(tempdir: &tempfile::TempDir) -> Connection { + let path = tempdir.path().join("clarion.db"); + let mut conn = Connection::open(&path).expect("open"); + pragma::apply_write_pragmas(&conn).expect("pragmas"); + schema::apply_migrations(&mut conn).expect("apply migrations"); + conn +} + +fn write_sheet(conn: &Connection, slug: &str, props: &Value) { + let id = format!("core:guidance:{slug}"); + let short = slug.rsplit('.').next().unwrap_or(slug); + upsert_guidance_sheet( + conn, + &GuidanceSheetInput { + id: &id, + name: slug, + short_name: short, + properties: props, + }, + ) + .expect("upsert guidance sheet"); +} + +fn base_props(scope_level: &str, authored_at: &str) -> Value { + json!({ + "content": format!("guidance for {scope_level}"), + "scope_level": scope_level, + "match_rules": [], + "pinned": false, + "provenance": "manual", + "authored_at": authored_at, + }) +} + +#[test] +fn upsert_then_get_roundtrips_properties_and_kind() { + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + + write_sheet( + &conn, + "demo.module-sheet", + &base_props("module", "2026-06-01T00:00:00.000Z"), + ); + + let kind: String = conn + .query_row( + "SELECT kind FROM entities WHERE id = ?1", + params!["core:guidance:demo.module-sheet"], + |row| row.get(0), + ) + .unwrap(); + assert_eq!(kind, "guidance"); + + let sheet = get_guidance_sheet(&conn, "core:guidance:demo.module-sheet") + .unwrap() + .expect("sheet present"); + assert_eq!(sheet.scope_level.as_deref(), Some("module")); + assert_eq!(sheet.scope_rank, Some(4)); // module → 4 + assert_eq!( + sheet.properties.get("content").and_then(Value::as_str), + Some("guidance for module") + ); + assert_eq!( + sheet.properties.get("provenance").and_then(Value::as_str), + Some("manual") + ); +} + +#[test] +fn insert_guidance_sheet_rejects_existing_id_without_overwrite() { + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + let first_props = base_props("module", "2026-06-01T00:00:00.000Z"); + let second_props = json!({ + "content": "second writer must not win", + "scope_level": "function", + "match_rules": [], + "pinned": true, + "provenance": "manual", + "authored_at": "2026-06-02T00:00:00.000Z", + }); + + insert_guidance_sheet( + &conn, + &GuidanceSheetInput { + id: "core:guidance:race.sheet", + name: "race.sheet", + short_name: "sheet", + properties: &first_props, + }, + ) + .expect("first insert succeeds"); + let err = insert_guidance_sheet( + &conn, + &GuidanceSheetInput { + id: "core:guidance:race.sheet", + name: "race.sheet", + short_name: "sheet", + properties: &second_props, + }, + ) + .expect_err("second create must fail instead of overwriting"); + + assert!( + err.to_string().contains("already exists"), + "duplicate create error should name existing sheet; got {err}" + ); + let sheet = get_guidance_sheet(&conn, "core:guidance:race.sheet") + .unwrap() + .expect("sheet present"); + assert_eq!( + sheet.properties.get("content").and_then(Value::as_str), + Some("guidance for module") + ); + assert_eq!(sheet.scope_level.as_deref(), Some("module")); + assert_eq!( + sheet.properties.get("pinned").and_then(Value::as_bool), + Some(false) + ); +} + +#[test] +fn get_returns_none_for_absent_or_non_guidance() { + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + assert!( + get_guidance_sheet(&conn, "core:guidance:nope") + .unwrap() + .is_none() + ); + + // A non-guidance entity with the same id must not be returned as a sheet. + conn.execute( + "INSERT INTO entities (id, plugin_id, kind, name, short_name, properties, \ + created_at, updated_at) VALUES \ + (?1, 'python', 'function', 'x', 'x', '{}', \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))", + params!["python:function:x"], + ) + .unwrap(); + assert!( + get_guidance_sheet(&conn, "python:function:x") + .unwrap() + .is_none() + ); +} + +#[test] +fn list_orders_by_scope_rank_then_authored_at_then_id() { + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + + // Insert deliberately out of scope-rank order. + write_sheet( + &conn, + "s.function", + &base_props("function", "2026-01-01T00:00:00.000Z"), + ); + write_sheet( + &conn, + "s.project", + &base_props("project", "2026-01-01T00:00:00.000Z"), + ); + write_sheet( + &conn, + "s.class", + &base_props("class", "2026-01-01T00:00:00.000Z"), + ); + write_sheet( + &conn, + "s.subsystem", + &base_props("subsystem", "2026-01-01T00:00:00.000Z"), + ); + write_sheet( + &conn, + "s.package", + &base_props("package", "2026-01-01T00:00:00.000Z"), + ); + write_sheet( + &conn, + "s.module", + &base_props("module", "2026-01-01T00:00:00.000Z"), + ); + + let listed = list_guidance_sheets(&conn).unwrap(); + let ranks: Vec = listed.iter().map(|s| s.scope_rank.unwrap()).collect(); + assert_eq!(ranks, vec![1, 2, 3, 4, 5, 6], "ordered project→function"); + let levels: Vec<&str> = listed + .iter() + .map(|s| s.scope_level.as_deref().unwrap()) + .collect(); + assert_eq!( + levels, + vec![ + "project", + "subsystem", + "package", + "module", + "class", + "function" + ] + ); +} + +#[test] +fn list_ties_break_by_authored_at_then_id() { + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + + // Same scope_level (same rank): order by authored_at ASC, then id ASC. + write_sheet( + &conn, + "a.later", + &base_props("module", "2026-06-02T00:00:00.000Z"), + ); + write_sheet( + &conn, + "a.earlier", + &base_props("module", "2026-06-01T00:00:00.000Z"), + ); + // Same authored_at as a.earlier — tie-broken by id (z.same > a.earlier). + write_sheet( + &conn, + "z.same", + &base_props("module", "2026-06-01T00:00:00.000Z"), + ); + + let ids: Vec = list_guidance_sheets(&conn) + .unwrap() + .into_iter() + .map(|s| s.id) + .collect(); + assert_eq!( + ids, + vec![ + "core:guidance:a.earlier".to_owned(), + "core:guidance:z.same".to_owned(), + "core:guidance:a.later".to_owned(), + ] + ); +} + +#[test] +fn upsert_updates_in_place_preserving_created_at() { + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + write_sheet( + &conn, + "demo.sheet", + &base_props("module", "2026-06-01T00:00:00.000Z"), + ); + let before = get_guidance_sheet(&conn, "core:guidance:demo.sheet") + .unwrap() + .unwrap(); + + // Re-upsert with changed scope_level + content. + write_sheet( + &conn, + "demo.sheet", + &base_props("class", "2026-06-01T00:00:00.000Z"), + ); + let after = get_guidance_sheet(&conn, "core:guidance:demo.sheet") + .unwrap() + .unwrap(); + + assert_eq!(after.created_at, before.created_at, "created_at preserved"); + assert_eq!(after.scope_rank, Some(5), "class → 5"); + + // Exactly one row — upsert, not duplicate insert. + let count: i64 = conn + .query_row( + "SELECT count(*) FROM entities WHERE kind = 'guidance'", + [], + |r| r.get(0), + ) + .unwrap(); + assert_eq!(count, 1); +} + +#[test] +fn delete_removes_only_guidance_sheet() { + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + write_sheet( + &conn, + "demo.sheet", + &base_props("module", "2026-06-01T00:00:00.000Z"), + ); + + assert!(delete_guidance_sheet(&conn, "core:guidance:demo.sheet").unwrap()); + assert!( + get_guidance_sheet(&conn, "core:guidance:demo.sheet") + .unwrap() + .is_none() + ); + // Second delete is a no-op (returns false). + assert!(!delete_guidance_sheet(&conn, "core:guidance:demo.sheet").unwrap()); +} + +#[test] +fn matcher_evaluates_kind_tag_and_entity_rules() { + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + let project_root = tempdir.path(); + + // A code entity with a tag. + conn.execute( + "INSERT INTO entities (id, plugin_id, kind, name, short_name, properties, \ + source_file_path, created_at, updated_at) VALUES \ + (?1, 'python', 'function', 'pkg.mod.f', 'f', '{}', ?2, \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))", + params![ + "python:function:pkg.mod.f", + project_root.join("src/pkg/mod.py").to_str().unwrap() + ], + ) + .unwrap(); + conn.execute( + "INSERT INTO entity_tags (entity_id, plugin_id, tag) VALUES (?1, 'python', 'auth')", + params!["python:function:pkg.mod.f"], + ) + .unwrap(); + + let kind_sheet = sheet_with_rules(&conn, "k", &json!([{"type":"kind","value":"function"}])); + let tag_sheet = sheet_with_rules(&conn, "t", &json!([{"type":"tag","value":"auth"}])); + let entity_sheet = sheet_with_rules( + &conn, + "e", + &json!([{"type":"entity","id":"python:function:pkg.mod.f"}]), + ); + let path_sheet = sheet_with_rules(&conn, "p", &json!([{"type":"path","pattern":"src/**"}])); + let nomatch = sheet_with_rules(&conn, "n", &json!([{"type":"kind","value":"class"}])); + let wardline = sheet_with_rules(&conn, "w", &json!([{"type":"wardline_group","group":"x"}])); + + let m = |s: &clarion_storage::GuidanceSheet| { + guidance_sheet_matches_entity(&conn, s, "python:function:pkg.mod.f", project_root).unwrap() + }; + assert!(m(&kind_sheet)); + assert!(m(&tag_sheet)); + assert!(m(&entity_sheet)); + assert!(m(&path_sheet)); + assert!(!m(&nomatch)); + assert!(!m(&wardline), "wardline_group not evaluable here"); +} + +fn sheet_with_rules( + conn: &Connection, + slug: &str, + rules: &Value, +) -> clarion_storage::GuidanceSheet { + let props = json!({ + "content": "x", + "scope_level": "module", + "match_rules": rules, + "provenance": "manual", + "authored_at": "2026-06-01T00:00:00.000Z", + }); + write_sheet(conn, slug, &props); + get_guidance_sheet(conn, &format!("core:guidance:{slug}")) + .unwrap() + .unwrap() +} + +fn seed_cache_row(conn: &Connection, entity_id: &str) { + conn.execute( + "INSERT INTO summary_cache \ + (entity_id, content_hash, prompt_template_id, model_tier, guidance_fingerprint, \ + summary_json, cost_usd, tokens_input, tokens_output, created_at, last_accessed_at, \ + caller_count, fan_out) \ + VALUES (?1, 'h', 'tmpl', 'tier', 'fp', '{}', 0.0, 0, 0, \ + '2026-01-01T00:00:00Z', '2026-01-01T00:00:00Z', 0, 0)", + params![entity_id], + ) + .unwrap(); +} + +fn cache_row_count(conn: &Connection, entity_id: &str) -> i64 { + conn.query_row( + "SELECT COUNT(*) FROM summary_cache WHERE entity_id = ?1", + params![entity_id], + |row| row.get(0), + ) + .unwrap() +} + +#[test] +fn invalidate_summaries_drops_matched_and_keeps_unmatched() { + use clarion_storage::invalidate_summaries_for_sheet; + + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + let project_root = tempdir.path(); + + // A `function` entity (the sheet's `kind:function` rule will match) and a + // `class` entity (it will not). Both have a cached summary. + conn.execute( + "INSERT INTO entities (id, plugin_id, kind, name, short_name, properties, \ + created_at, updated_at) VALUES \ + (?1, 'python', 'function', 'pkg.mod.f', 'f', '{}', \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))", + params!["python:function:pkg.mod.f"], + ) + .unwrap(); + conn.execute( + "INSERT INTO entities (id, plugin_id, kind, name, short_name, properties, \ + created_at, updated_at) VALUES \ + (?1, 'python', 'class', 'pkg.mod.C', 'C', '{}', \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))", + params!["python:class:pkg.mod.C"], + ) + .unwrap(); + seed_cache_row(&conn, "python:function:pkg.mod.f"); + seed_cache_row(&conn, "python:class:pkg.mod.C"); + + let sheet = sheet_with_rules(&conn, "k", &json!([{"type":"kind","value":"function"}])); + let removed = invalidate_summaries_for_sheet(&conn, &sheet, project_root).unwrap(); + + assert_eq!(removed, 1, "exactly one matched entity's cache invalidated"); + assert_eq!( + cache_row_count(&conn, "python:function:pkg.mod.f"), + 0, + "matched entity's cache row gone" + ); + assert_eq!( + cache_row_count(&conn, "python:class:pkg.mod.C"), + 1, + "non-matching entity's cache row survives" + ); +} + +#[test] +fn upsert_rejects_non_guidance_id_and_leaves_code_entity_intact() { + // FINDING 1: a sheet id that is NOT `core:guidance:` (e.g. a hand-edited / + // malicious import naming a code entity) must be rejected by + // `upsert_guidance_sheet`, and must NOT overwrite the existing code entity. + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + + // A pre-existing code entity with distinctive name/properties. + conn.execute( + "INSERT INTO entities (id, plugin_id, kind, name, short_name, properties, \ + created_at, updated_at) VALUES \ + (?1, 'python', 'function', 'pkg.mod.foo', 'foo', '{\"k\":\"v\"}', \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))", + params!["python:function:foo"], + ) + .unwrap(); + + let before: (String, String, String, String) = conn + .query_row( + "SELECT name, kind, plugin_id, properties FROM entities WHERE id = ?1", + params!["python:function:foo"], + |r| Ok((r.get(0)?, r.get(1)?, r.get(2)?, r.get(3)?)), + ) + .unwrap(); + + // Attempt to upsert a "guidance" sheet whose id collides with the code entity. + let props = base_props("module", "2026-06-01T00:00:00.000Z"); + let err = upsert_guidance_sheet( + &conn, + &GuidanceSheetInput { + id: "python:function:foo", + name: "evil", + short_name: "evil", + properties: &props, + }, + ); + assert!(err.is_err(), "non-guidance id must be rejected"); + + let after: (String, String, String, String) = conn + .query_row( + "SELECT name, kind, plugin_id, properties FROM entities WHERE id = ?1", + params!["python:function:foo"], + |r| Ok((r.get(0)?, r.get(1)?, r.get(2)?, r.get(3)?)), + ) + .unwrap(); + assert_eq!( + after, before, + "code entity must be byte-identical after a rejected upsert" + ); +} + +#[test] +fn upsert_accepts_valid_guidance_id() { + // FINDING 1: the canonical `core:guidance:` id still upserts fine. + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + write_sheet( + &conn, + "valid.sheet", + &base_props("module", "2026-06-01T00:00:00.000Z"), + ); + assert!( + get_guidance_sheet(&conn, "core:guidance:valid.sheet") + .unwrap() + .is_some() + ); +} + +#[test] +fn invalidate_summaries_for_no_rule_sheet_is_noop() { + use clarion_storage::invalidate_summaries_for_sheet; + + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + + conn.execute( + "INSERT INTO entities (id, plugin_id, kind, name, short_name, properties, \ + created_at, updated_at) VALUES \ + (?1, 'python', 'function', 'pkg.mod.f', 'f', '{}', \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))", + params!["python:function:pkg.mod.f"], + ) + .unwrap(); + seed_cache_row(&conn, "python:function:pkg.mod.f"); + + let sheet = sheet_with_rules(&conn, "empty", &json!([])); + let removed = invalidate_summaries_for_sheet(&conn, &sheet, tempdir.path()).unwrap(); + assert_eq!(removed, 0, "a no-rule sheet invalidates nothing"); + assert_eq!(cache_row_count(&conn, "python:function:pkg.mod.f"), 1); +} + +#[test] +fn invalidate_summaries_includes_guides_edge_targets() { + // FINDING 3: a sheet that applies SOLELY via a `guides` edge (NO match_rules) + // must still invalidate the guided entity's cached summary. The `guidance_for` + // read path composes match_rules OR guides edges, so invalidation must too. + use clarion_storage::invalidate_summaries_for_sheet; + + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + + // A code entity that will be the `guides`-edge target (it must exist first, + // for the edge's FK). + conn.execute( + "INSERT INTO entities (id, plugin_id, kind, name, short_name, properties, \ + created_at, updated_at) VALUES \ + (?1, 'python', 'function', 'pkg.mod.g', 'g', '{}', \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))", + params!["python:function:pkg.mod.g"], + ) + .unwrap(); + seed_cache_row(&conn, "python:function:pkg.mod.g"); + + // A sheet with NO match_rules — so any invalidation can ONLY come from the + // guides edge, not a rule. + let sheet = sheet_with_rules(&conn, "guides-only", &json!([])); + conn.execute( + "INSERT INTO edges (kind, from_id, to_id, confidence) VALUES \ + ('guides', ?1, ?2, 'resolved')", + params!["core:guidance:guides-only", "python:function:pkg.mod.g"], + ) + .unwrap(); + + let removed = invalidate_summaries_for_sheet(&conn, &sheet, tempdir.path()).unwrap(); + assert_eq!( + removed, 1, + "the guides-edge target's cache row is invalidated" + ); + assert_eq!( + cache_row_count(&conn, "python:function:pkg.mod.g"), + 0, + "guided entity's summary row must be gone" + ); +} + +#[test] +fn invalidate_summaries_dedups_rule_and_guides_match() { + // FINDING 3: an entity matched by BOTH a match_rule AND a guides edge is + // invalidated exactly once (count is 1, not 2). + use clarion_storage::invalidate_summaries_for_sheet; + + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + + conn.execute( + "INSERT INTO entities (id, plugin_id, kind, name, short_name, properties, \ + created_at, updated_at) VALUES \ + (?1, 'python', 'function', 'pkg.mod.h', 'h', '{}', \ + strftime('%Y-%m-%dT%H:%M:%fZ','now'), strftime('%Y-%m-%dT%H:%M:%fZ','now'))", + params!["python:function:pkg.mod.h"], + ) + .unwrap(); + seed_cache_row(&conn, "python:function:pkg.mod.h"); + + // A sheet whose `kind:function` rule matches the entity AND a guides edge to + // the same entity → it must count once. + let sheet = sheet_with_rules(&conn, "both", &json!([{"type":"kind","value":"function"}])); + conn.execute( + "INSERT INTO edges (kind, from_id, to_id, confidence) VALUES \ + ('guides', ?1, ?2, 'resolved')", + params!["core:guidance:both", "python:function:pkg.mod.h"], + ) + .unwrap(); + + let removed = invalidate_summaries_for_sheet(&conn, &sheet, tempdir.path()).unwrap(); + assert_eq!( + removed, 1, + "matched by both rule and guides edge → invalidated once" + ); +} diff --git a/crates/clarion-storage/tests/query_helpers.rs b/crates/clarion-storage/tests/query_helpers.rs index 791b8cab..a076cc38 100644 --- a/crates/clarion-storage/tests/query_helpers.rs +++ b/crates/clarion-storage/tests/query_helpers.rs @@ -9,7 +9,8 @@ use clarion_storage::{ entity_at_line, entity_briefing_block_reason, entity_by_id, find_entities, findings_for_emit, module_dependency_edges, module_reference_rollup, normalize_source_path, pragma, reference_edges_for_entity, resolve_file, resolve_file_catalog_entry, schema, - subsystem_for_member, subsystem_members, subsystem_of_entity, + subsystem_for_member, subsystem_members, subsystem_of_entity, unresolved_call_sites_for_caller, + unresolved_callers_for_target, }; use rusqlite::{Connection, params}; @@ -25,6 +26,21 @@ fn insert_entity(conn: &Connection, id: &str, kind: &str) { insert_named_entity(conn, id, kind, id, id, None); } +fn insert_entity_with_hash(conn: &Connection, id: &str, kind: &str, content_hash: &str) { + conn.execute( + "INSERT INTO entities ( + id, plugin_id, kind, name, short_name, properties, content_hash, created_at, + updated_at + ) VALUES ( + ?1, 'python', ?2, ?1, ?1, '{}', ?3, + strftime('%Y-%m-%dT%H:%M:%fZ', 'now'), + strftime('%Y-%m-%dT%H:%M:%fZ', 'now') + )", + params![id, kind, content_hash], + ) + .expect("insert entity with hash"); +} + fn insert_named_entity( conn: &Connection, id: &str, @@ -105,6 +121,23 @@ fn insert_contains_edge(conn: &Connection, from_id: &str, to_id: &str) { .expect("insert contains edge"); } +fn insert_unresolved_call_site( + conn: &Connection, + caller_id: &str, + caller_content_hash: &str, + site_key: &str, + callee_expr: &str, +) { + conn.execute( + "INSERT INTO entity_unresolved_call_sites ( + caller_entity_id, caller_content_hash, site_key, site_ordinal, + source_byte_start, source_byte_end, callee_expr, created_at + ) VALUES (?1, ?2, ?3, 0, 10, 20, ?4, strftime('%Y-%m-%dT%H:%M:%fZ', 'now'))", + params![caller_id, caller_content_hash, site_key, callee_expr], + ) + .expect("insert unresolved call site"); +} + fn insert_references_edge( conn: &Connection, from_id: &str, @@ -401,6 +434,79 @@ fn module_dependency_edges_expands_ambiguous_call_candidates() { ); } +#[test] +fn unresolved_call_sites_for_caller_filters_stale_content_hash_rows() { + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + insert_entity_with_hash( + &conn, + "python:function:pkg.caller", + "function", + "hash-current", + ); + insert_unresolved_call_site( + &conn, + "python:function:pkg.caller", + "hash-old", + "stale-site", + "dynamic_old", + ); + insert_unresolved_call_site( + &conn, + "python:function:pkg.caller", + "hash-current", + "current-site", + "dynamic_current", + ); + + let sites = unresolved_call_sites_for_caller(&conn, "python:function:pkg.caller", 10).unwrap(); + + assert_eq!(sites.len(), 1); + assert_eq!(sites[0].site_key, "current-site"); + assert_eq!(sites[0].caller_content_hash, "hash-current"); +} + +#[test] +fn unresolved_callers_for_target_filters_stale_caller_content_hash_rows() { + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + insert_entity_with_hash( + &conn, + "python:function:pkg.caller", + "function", + "hash-current", + ); + insert_entity_with_hash( + &conn, + "python:function:pkg.target", + "function", + "hash-target", + ); + insert_unresolved_call_site( + &conn, + "python:function:pkg.caller", + "hash-old", + "stale-site", + "target", + ); + insert_unresolved_call_site( + &conn, + "python:function:pkg.caller", + "hash-current", + "current-site", + "target", + ); + let target = entity_by_id(&conn, "python:function:pkg.target") + .unwrap() + .expect("target"); + + let sites = unresolved_callers_for_target(&conn, &target, 10).unwrap(); + + assert_eq!(sites.len(), 1); + assert_eq!(sites[0].site_key, "current-site"); + assert_eq!(sites[0].caller_content_hash, "hash-current"); +} + #[test] fn subsystem_members_returns_modules_ordered_by_name() { let tempdir = tempfile::tempdir().unwrap(); diff --git a/crates/clarion-storage/tests/schema_apply.rs b/crates/clarion-storage/tests/schema_apply.rs index a1c86d7f..41be4320 100644 --- a/crates/clarion-storage/tests/schema_apply.rs +++ b/crates/clarion-storage/tests/schema_apply.rs @@ -123,6 +123,26 @@ fn entity_tags_primary_key_includes_tag_owner_plugin() { ); } +#[test] +fn runs_table_records_owner_pid_and_heartbeat() { + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + let columns = table_columns(&conn, "runs"); + + for expected in ["owner_pid", "heartbeat_at"] { + assert!( + columns.iter().any(|column| column == expected), + "missing runs.{expected} in {columns:?}" + ); + } + + let indexes = index_names(&conn); + assert!( + indexes.iter().any(|idx| idx == "ix_runs_running_heartbeat"), + "missing running-heartbeat index in {indexes:?}" + ); +} + #[test] fn migration_0001_creates_entity_fts_virtual_table() { let tempdir = tempfile::tempdir().unwrap(); @@ -775,7 +795,7 @@ fn migrations_are_idempotent() { let tempdir = tempfile::tempdir().unwrap(); let mut conn = open_fresh(&tempdir); schema::apply_migrations(&mut conn).expect("second apply should be a no-op"); - assert_eq!(schema::applied_count(&conn).unwrap(), 7); + assert_eq!(schema::applied_count(&conn).unwrap(), 8); let tables_after = table_names(&conn); assert!(tables_after.contains(&"entities".to_owned())); } @@ -789,7 +809,7 @@ fn schema_migrations_records_each_applied_migration() { row.get(0) }) .unwrap(); - assert_eq!(count, 7); + assert_eq!(count, 8); let names: Vec = { let mut stmt = conn .prepare("SELECT name FROM schema_migrations ORDER BY version") @@ -807,6 +827,7 @@ fn schema_migrations_records_each_applied_migration() { "0005_sei", "0006_wardline_taint_sei", "0007_run_analyzed_commit", + "0008_run_owner_heartbeat", ] ); } diff --git a/crates/clarion-storage/tests/writer_actor.rs b/crates/clarion-storage/tests/writer_actor.rs index a4ebc910..9fc99287 100644 --- a/crates/clarion-storage/tests/writer_actor.rs +++ b/crates/clarion-storage/tests/writer_actor.rs @@ -11,7 +11,7 @@ use clarion_storage::{ InferredCallEdgeRecord, InferredEdgeCacheEntry, InferredEdgeCacheKey, ReaderPool, SummaryCacheEntry, SummaryCacheKey, UnresolvedCallSiteRecord, Writer, commands::{EdgeConfidence, EdgeRecord, EntityRecord, FindingRecord, RunStatus, WriterCmd}, - pragma, schema, + mark_stale_running_runs_failed, pragma, schema, }; fn prepared_db(dir: &tempfile::TempDir) -> std::path::PathBuf { @@ -66,6 +66,7 @@ fn make_entity(id: &str) -> EntityRecord { source_line_start: None, source_line_end: None, properties_json: "{}".to_owned(), + tags: Vec::new(), content_hash: None, summary_json: None, wardline_json: None, @@ -88,6 +89,16 @@ fn make_module_entity(id: &str) -> EntityRecord { e } +fn make_file_entity(id: &str) -> EntityRecord { + let mut e = make_entity(id); + "core".clone_into(&mut e.plugin_id); + "file".clone_into(&mut e.kind); + "demo.py".clone_into(&mut e.name); + "demo.py".clone_into(&mut e.short_name); + e.content_hash = Some("hash-core:file:demo.py".to_owned()); + e +} + fn make_contains_edge(from_id: &str, to_id: &str) -> EdgeRecord { EdgeRecord { kind: "contains".to_owned(), @@ -95,7 +106,7 @@ fn make_contains_edge(from_id: &str, to_id: &str) -> EdgeRecord { to_id: to_id.to_owned(), confidence: EdgeConfidence::Resolved, properties_json: None, - source_file_id: Some(from_id.to_owned()), + source_file_id: None, source_byte_start: None, source_byte_end: None, } @@ -113,7 +124,7 @@ fn make_structural_edge( to_id: to_id.to_owned(), confidence, properties_json: None, - source_file_id: Some(from_id.to_owned()), + source_file_id: None, source_byte_start: None, source_byte_end: None, } @@ -126,7 +137,7 @@ fn make_calls_edge(from_id: &str, to_id: &str, confidence: EdgeConfidence) -> Ed to_id: to_id.to_owned(), confidence, properties_json: None, - source_file_id: Some("python:module:demo".to_owned()), + source_file_id: None, source_byte_start: Some(10), source_byte_end: Some(18), } @@ -139,7 +150,7 @@ fn make_references_edge(from_id: &str, to_id: &str, confidence: EdgeConfidence) to_id: to_id.to_owned(), confidence, properties_json: None, - source_file_id: Some("python:module:demo".to_owned()), + source_file_id: None, source_byte_start: Some(20), source_byte_end: Some(25), } @@ -212,7 +223,7 @@ fn unresolved_site(callee_expr: &str, ordinal: i64) -> UnresolvedCallSiteRecord caller_content_hash: "hash-python:function:demo.caller".to_owned(), site_key: format!("site-{ordinal}"), site_ordinal: ordinal, - source_file_id: Some("python:module:demo".to_owned()), + source_file_id: None, source_byte_start: ordinal * 10, source_byte_end: ordinal * 10 + 4, callee_expr: callee_expr.to_owned(), @@ -240,7 +251,7 @@ fn inferred_record(to_id: &str, start: i64) -> InferredCallEdgeRecord { InferredCallEdgeRecord { from_id: "python:function:demo.caller".to_owned(), to_id: to_id.to_owned(), - source_file_id: Some("python:module:demo".to_owned()), + source_file_id: None, source_byte_start: start, source_byte_end: start + 8, properties_json: r#"{"inference_cache_key":"cache-a"}"#.to_owned(), @@ -615,6 +626,66 @@ async fn round_trip_insert_persists_entity() { assert_eq!(kind, "function"); } +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn insert_entity_replaces_entity_tags_for_same_plugin_entity() { + let dir = tempfile::tempdir().unwrap(); + let path = prepared_db(&dir); + let (writer, handle) = Writer::spawn(path.clone(), 50, 256).unwrap(); + let tx = writer.sender(); + + begin_demo_run(&tx, "run-tags").await; + let mut first = make_entity("python:function:demo.hello"); + first.tags = vec!["entry-point".to_owned(), "test".to_owned()]; + send::<()>(&tx, |ack| WriterCmd::InsertEntity { + entity: Box::new(first), + ack, + }) + .await + .unwrap(); + + let mut second = make_entity("python:function:demo.hello"); + second.tags = vec!["http-route".to_owned()]; + send::<()>(&tx, |ack| WriterCmd::InsertEntity { + entity: Box::new(second), + ack, + }) + .await + .unwrap(); + + send::<()>(&tx, |ack| WriterCmd::CommitRun { + run_id: "run-tags".into(), + status: RunStatus::Completed, + completed_at: now_iso(), + stats_json: "{}".into(), + ack, + }) + .await + .unwrap(); + + drop(tx); + drop(writer); + handle.await.unwrap().unwrap(); + + let conn = Connection::open(path).unwrap(); + let tags: Vec = { + let mut stmt = conn + .prepare( + "SELECT tag FROM entity_tags \ + WHERE entity_id = ?1 AND plugin_id = ?2 \ + ORDER BY tag", + ) + .unwrap(); + stmt.query_map( + rusqlite::params!["python:function:demo.hello", "python"], + |row| row.get(0), + ) + .unwrap() + .map(Result::unwrap) + .collect() + }; + assert_eq!(tags, vec!["http-route".to_owned()]); +} + #[tokio::test(flavor = "multi_thread", worker_threads = 2)] async fn insert_entity_is_idempotent_across_runs() { // Regression: `clarion analyze` re-runs against an unchanged corpus @@ -870,6 +941,155 @@ async fn resume_run_errors_when_run_id_unknown() { handle.await.unwrap().unwrap(); } +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn run_lifecycle_records_owner_pid_and_heartbeat_until_terminal() { + let dir = tempfile::tempdir().unwrap(); + let path = prepared_db(&dir); + let (writer, handle) = Writer::spawn(path.clone(), 50, 256).unwrap(); + let tx = writer.sender(); + let expected_pid = i64::from(std::process::id()); + + send::<()>(&tx, |ack| WriterCmd::BeginRun { + run_id: "run-heartbeat".into(), + config_json: "{}".into(), + started_at: now_iso(), + head_commit: None, + ack, + }) + .await + .unwrap(); + + { + let conn = Connection::open(&path).unwrap(); + let (status, owner_pid, heartbeat_at): (String, Option, Option) = conn + .query_row( + "SELECT status, owner_pid, heartbeat_at FROM runs WHERE id = 'run-heartbeat'", + [], + |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)), + ) + .unwrap(); + assert_eq!(status, "running"); + assert_eq!(owner_pid, Some(expected_pid)); + assert_eq!(heartbeat_at.as_deref(), Some(now_iso().as_str())); + } + + send::<()>(&tx, |ack| WriterCmd::CommitRun { + run_id: "run-heartbeat".into(), + status: RunStatus::Completed, + completed_at: now_iso(), + stats_json: "{}".into(), + ack, + }) + .await + .unwrap(); + + { + let conn = Connection::open(&path).unwrap(); + let owner_pid: Option = conn + .query_row( + "SELECT owner_pid FROM runs WHERE id = 'run-heartbeat'", + [], + |row| row.get(0), + ) + .unwrap(); + assert_eq!(owner_pid, None, "terminal runs must release pid ownership"); + } + + send::<()>(&tx, |ack| WriterCmd::ResumeRun { + run_id: "run-heartbeat".into(), + ack, + }) + .await + .unwrap(); + + { + let conn = Connection::open(&path).unwrap(); + let (status, completed_at, owner_pid, heartbeat_at): ( + String, + Option, + Option, + Option, + ) = conn + .query_row( + "SELECT status, completed_at, owner_pid, heartbeat_at \ + FROM runs WHERE id = 'run-heartbeat'", + [], + |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?)), + ) + .unwrap(); + assert_eq!(status, "running"); + assert_eq!(completed_at, None); + assert_eq!(owner_pid, Some(expected_pid)); + assert!( + heartbeat_at + .as_deref() + .is_some_and(|value| value.ends_with('Z')), + "resume should refresh heartbeat_at: {heartbeat_at:?}" + ); + } + + send::<()>(&tx, |ack| WriterCmd::CommitRun { + run_id: "run-heartbeat".into(), + status: RunStatus::Completed, + completed_at: now_iso(), + stats_json: "{}".into(), + ack, + }) + .await + .unwrap(); + + drop(tx); + drop(writer); + handle.await.unwrap().unwrap(); +} + +#[test] +fn stale_running_repair_fails_pre_migration_rows_with_null_heartbeat() { + let dir = tempfile::tempdir().unwrap(); + let path = prepared_db(&dir); + let conn = Connection::open(path).unwrap(); + conn.execute( + "INSERT INTO runs ( \ + id, started_at, completed_at, config, stats, status, owner_pid, heartbeat_at \ + ) VALUES ( \ + 'run-null-heartbeat', '2026-02-04T00:00:00.000Z', NULL, '{}', '{}', \ + 'running', 999999, NULL \ + )", + [], + ) + .expect("insert upgraded pre-heartbeat running row"); + + let changed = mark_stale_running_runs_failed(&conn).expect("repair stale runs"); + assert_eq!(changed, 1); + + let (status, owner_pid, completed_at, stats_json): ( + String, + Option, + Option, + String, + ) = conn + .query_row( + "SELECT status, owner_pid, completed_at, stats \ + FROM runs WHERE id = 'run-null-heartbeat'", + [], + |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?)), + ) + .expect("read repaired run"); + assert_eq!(status, "failed"); + assert_eq!(owner_pid, None); + assert!( + completed_at + .as_deref() + .is_some_and(|value| value.ends_with('Z')), + "repair should stamp completed_at: {completed_at:?}" + ); + let repair_stats: serde_json::Value = serde_json::from_str(&stats_json).expect("stats json"); + assert_eq!( + repair_stats["failure_reason"], + "analyze run abandoned: stale heartbeat" + ); +} + #[tokio::test(flavor = "multi_thread", worker_threads = 2)] async fn non_core_plugin_cannot_insert_reserved_entity_kind() { let dir = tempfile::tempdir().unwrap(); @@ -1209,25 +1429,42 @@ async fn entity_source_file_id_rejects_non_source_anchor_entity() { } #[tokio::test(flavor = "multi_thread", worker_threads = 2)] -async fn module_entity_may_reference_itself_as_source_file_id() { +async fn entity_source_file_id_accepts_core_file_anchor() { let dir = tempfile::tempdir().unwrap(); let path = prepared_db(&dir); let (writer, handle) = Writer::spawn(path.clone(), 50, 256).unwrap(); let tx = writer.sender(); - begin_demo_run(&tx, "run-source-self").await; + begin_demo_run(&tx, "run-source-file-anchor").await; + send::<()>(&tx, |ack| WriterCmd::InsertEntity { + entity: Box::new(make_file_entity("core:file:demo.py")), + ack, + }) + .await + .unwrap(); let mut module = make_module_entity("python:module:demo"); - module.source_file_id = Some("python:module:demo".to_owned()); + module.parent_id = Some("core:file:demo.py".to_owned()); + module.source_file_id = Some("core:file:demo.py".to_owned()); send::<()>(&tx, |ack| WriterCmd::InsertEntity { entity: Box::new(module), ack, }) .await - .expect("module source anchor may reference itself"); + .expect("module source anchor may reference core file entity"); + + send::<()>(&tx, |ack| WriterCmd::InsertEdge { + edge: Box::new(make_contains_edge( + "core:file:demo.py", + "python:module:demo", + )), + ack, + }) + .await + .unwrap(); send::<()>(&tx, |ack| WriterCmd::CommitRun { - run_id: "run-source-self".into(), + run_id: "run-source-file-anchor".into(), status: RunStatus::Completed, completed_at: now_iso(), stats_json: "{}".into(), @@ -1241,6 +1478,40 @@ async fn module_entity_may_reference_itself_as_source_file_id() { handle.await.unwrap().unwrap(); } +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn module_entity_rejected_as_source_file_id() { + let dir = tempfile::tempdir().unwrap(); + let path = prepared_db(&dir); + let (writer, handle) = Writer::spawn(path.clone(), 50, 256).unwrap(); + let tx = writer.sender(); + + begin_demo_run(&tx, "run-source-module-reject").await; + send::<()>(&tx, |ack| WriterCmd::InsertEntity { + entity: Box::new(make_module_entity("python:module:demo")), + ack, + }) + .await + .unwrap(); + + let mut module = make_module_entity("python:module:demo"); + module.source_file_id = Some("python:module:demo".to_owned()); + + let result = send::<()>(&tx, |ack| WriterCmd::InsertEntity { + entity: Box::new(module), + ack, + }) + .await + .expect_err("module entity must not be accepted as a source_file_id anchor"); + assert!( + format!("{result:?}").contains("CLA-INFRA-SOURCE-FILE-KIND-CONTRACT"), + "expected CLA-INFRA-SOURCE-FILE-KIND-CONTRACT in error; got {result:?}" + ); + + drop(tx); + drop(writer); + handle.await.unwrap().unwrap(); +} + #[test] fn python_plugin_edge_kinds_are_accepted_by_writer_contract() { let manifest = @@ -1869,7 +2140,7 @@ async fn calls_edge_without_byte_offsets_rejected_by_per_kind_contract() { to_id: "python:function:demo.callee".to_owned(), confidence: EdgeConfidence::Resolved, properties_json: None, - source_file_id: Some("python:module:demo".to_owned()), + source_file_id: None, source_byte_start: None, source_byte_end: None, }; @@ -1929,7 +2200,7 @@ async fn unknown_edge_kind_rejected_strictly() { to_id: "python:function:demo.f".to_owned(), confidence: EdgeConfidence::Resolved, properties_json: None, - source_file_id: Some("python:module:demo".to_owned()), + source_file_id: None, source_byte_start: None, source_byte_end: None, }; diff --git a/docs/README.md b/docs/README.md index d9c50eec..d39fd6b2 100644 --- a/docs/README.md +++ b/docs/README.md @@ -10,12 +10,12 @@ This `docs/` tree is organized by reader intent: - New to the suite: [suite/briefing.md](./suite/briefing.md) - Evaluating the Loom doctrine: [suite/loom.md](./suite/loom.md) -- Starting Clarion v1.0: [clarion/1.0/README.md](./clarion/1.0/README.md) +- Starting Clarion: [clarion/1.0/README.md](./clarion/1.0/README.md) - Configuring OpenRouter: [operator/openrouter.md](./operator/openrouter.md) ## Canonical vs supporting docs - Canonical suite docs live in [suite/](./suite/README.md). -- Canonical Clarion v0.1 design docs live in [clarion/v0.1/](./clarion/1.0/README.md). +- Canonical Clarion design docs live in [clarion/1.0/](./clarion/1.0/README.md). - Architecture decisions live in [clarion/adr/](./clarion/adr/README.md). - Supporting reviews, scope memos, sprint plans, and agent handoffs are archived under [implementation/](./implementation/README.md) — non-normative. diff --git a/docs/archive/README.md b/docs/archive/README.md new file mode 100644 index 00000000..1a34350f --- /dev/null +++ b/docs/archive/README.md @@ -0,0 +1,17 @@ +# Documentation Archive + +This directory holds retained non-normative material that is useful for audit +history but should not appear in active release-facing document paths. + +| Path | Contents | +|---|---| +| [working-notes/](./working-notes/) | Archived agent working notes and validation fragments from architecture-analysis runs. | + +## Working Notes + +The following folders were moved out of tracked `temp/` directories during the +2026-06-04 release-prep cleanup: + +- `docs/archive/working-notes/arch-analysis-2026-05-22-1924/` +- `docs/archive/working-notes/arch-analysis-2026-06-02-1522/` +- `docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/` diff --git a/docs/arch-analysis-2026-05-22-1924/temp/answer-python-engineer.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/answer-python-engineer.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/answer-python-engineer.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/answer-python-engineer.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/answer-quality-engineer.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/answer-quality-engineer.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/answer-quality-engineer.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/answer-quality-engineer.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/answer-security-engineer.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/answer-security-engineer.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/answer-security-engineer.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/answer-security-engineer.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/answer-solution-architect.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/answer-solution-architect.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/answer-solution-architect.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/answer-solution-architect.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/answer-systems-thinker.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/answer-systems-thinker.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/answer-systems-thinker.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/answer-systems-thinker.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/catalog-clarion-cli.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-clarion-cli.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/catalog-clarion-cli.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-clarion-cli.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/catalog-clarion-core.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-clarion-core.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/catalog-clarion-core.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-clarion-core.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/catalog-clarion-mcp.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-clarion-mcp.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/catalog-clarion-mcp.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-clarion-mcp.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/catalog-clarion-plugin-fixture.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-clarion-plugin-fixture.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/catalog-clarion-plugin-fixture.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-clarion-plugin-fixture.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/catalog-clarion-scanner.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-clarion-scanner.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/catalog-clarion-scanner.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-clarion-scanner.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/catalog-clarion-storage.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-clarion-storage.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/catalog-clarion-storage.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-clarion-storage.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/catalog-python-plugin.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-python-plugin.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/catalog-python-plugin.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/catalog-python-plugin.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/task-catalog-template.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/task-catalog-template.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/task-catalog-template.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/task-catalog-template.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/task-discovery.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/task-discovery.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/task-discovery.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/task-discovery.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/validation-catalog.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/validation-catalog.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/validation-catalog.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/validation-catalog.md diff --git a/docs/arch-analysis-2026-05-22-1924/temp/validation-final-report.md b/docs/archive/working-notes/arch-analysis-2026-05-22-1924/validation-final-report.md similarity index 100% rename from docs/arch-analysis-2026-05-22-1924/temp/validation-final-report.md rename to docs/archive/working-notes/arch-analysis-2026-05-22-1924/validation-final-report.md diff --git a/docs/arch-analysis-2026-06-02-1522/temp/catalog-cli-http.md b/docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-cli-http.md similarity index 100% rename from docs/arch-analysis-2026-06-02-1522/temp/catalog-cli-http.md rename to docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-cli-http.md diff --git a/docs/arch-analysis-2026-06-02-1522/temp/catalog-core.md b/docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-core.md similarity index 100% rename from docs/arch-analysis-2026-06-02-1522/temp/catalog-core.md rename to docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-core.md diff --git a/docs/arch-analysis-2026-06-02-1522/temp/catalog-mcp.md b/docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-mcp.md similarity index 100% rename from docs/arch-analysis-2026-06-02-1522/temp/catalog-mcp.md rename to docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-mcp.md diff --git a/docs/arch-analysis-2026-06-02-1522/temp/catalog-pipeline.md b/docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-pipeline.md similarity index 100% rename from docs/arch-analysis-2026-06-02-1522/temp/catalog-pipeline.md rename to docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-pipeline.md diff --git a/docs/arch-analysis-2026-06-02-1522/temp/catalog-policy-llm.md b/docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-policy-llm.md similarity index 100% rename from docs/arch-analysis-2026-06-02-1522/temp/catalog-policy-llm.md rename to docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-policy-llm.md diff --git a/docs/arch-analysis-2026-06-02-1522/temp/catalog-python.md b/docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-python.md similarity index 100% rename from docs/arch-analysis-2026-06-02-1522/temp/catalog-python.md rename to docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-python.md diff --git a/docs/arch-analysis-2026-06-02-1522/temp/catalog-scanner-fixture.md b/docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-scanner-fixture.md similarity index 100% rename from docs/arch-analysis-2026-06-02-1522/temp/catalog-scanner-fixture.md rename to docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-scanner-fixture.md diff --git a/docs/arch-analysis-2026-06-02-1522/temp/catalog-storage.md b/docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-storage.md similarity index 100% rename from docs/arch-analysis-2026-06-02-1522/temp/catalog-storage.md rename to docs/archive/working-notes/arch-analysis-2026-06-02-1522/catalog-storage.md diff --git a/docs/arch-analysis-2026-06-02-1522/temp/validation-catalog.md b/docs/archive/working-notes/arch-analysis-2026-06-02-1522/validation-catalog.md similarity index 100% rename from docs/arch-analysis-2026-06-02-1522/temp/validation-catalog.md rename to docs/archive/working-notes/arch-analysis-2026-06-02-1522/validation-catalog.md diff --git a/docs/arch-analysis-2026-06-02-1522/temp/validation-final-report.md b/docs/archive/working-notes/arch-analysis-2026-06-02-1522/validation-final-report.md similarity index 100% rename from docs/arch-analysis-2026-06-02-1522/temp/validation-final-report.md rename to docs/archive/working-notes/arch-analysis-2026-06-02-1522/validation-final-report.md diff --git a/docs/implementation/arch-analysis-2026-05-20-2124/temp/task-clarion-cli-scanner.md b/docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/task-clarion-cli-scanner.md similarity index 100% rename from docs/implementation/arch-analysis-2026-05-20-2124/temp/task-clarion-cli-scanner.md rename to docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/task-clarion-cli-scanner.md diff --git a/docs/implementation/arch-analysis-2026-05-20-2124/temp/task-clarion-core-fixture.md b/docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/task-clarion-core-fixture.md similarity index 100% rename from docs/implementation/arch-analysis-2026-05-20-2124/temp/task-clarion-core-fixture.md rename to docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/task-clarion-core-fixture.md diff --git a/docs/implementation/arch-analysis-2026-05-20-2124/temp/task-clarion-mcp.md b/docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/task-clarion-mcp.md similarity index 100% rename from docs/implementation/arch-analysis-2026-05-20-2124/temp/task-clarion-mcp.md rename to docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/task-clarion-mcp.md diff --git a/docs/implementation/arch-analysis-2026-05-20-2124/temp/task-clarion-storage.md b/docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/task-clarion-storage.md similarity index 100% rename from docs/implementation/arch-analysis-2026-05-20-2124/temp/task-clarion-storage.md rename to docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/task-clarion-storage.md diff --git a/docs/implementation/arch-analysis-2026-05-20-2124/temp/task-python-plugin.md b/docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/task-python-plugin.md similarity index 100% rename from docs/implementation/arch-analysis-2026-05-20-2124/temp/task-python-plugin.md rename to docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/task-python-plugin.md diff --git a/docs/implementation/arch-analysis-2026-05-20-2124/temp/task-release-federation-docs.md b/docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/task-release-federation-docs.md similarity index 100% rename from docs/implementation/arch-analysis-2026-05-20-2124/temp/task-release-federation-docs.md rename to docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/task-release-federation-docs.md diff --git a/docs/implementation/arch-analysis-2026-05-20-2124/temp/validation-02-subsystem-catalog.md b/docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/validation-02-subsystem-catalog.md similarity index 100% rename from docs/implementation/arch-analysis-2026-05-20-2124/temp/validation-02-subsystem-catalog.md rename to docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/validation-02-subsystem-catalog.md diff --git a/docs/implementation/arch-analysis-2026-05-20-2124/temp/validation-03-diagrams.md b/docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/validation-03-diagrams.md similarity index 100% rename from docs/implementation/arch-analysis-2026-05-20-2124/temp/validation-03-diagrams.md rename to docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/validation-03-diagrams.md diff --git a/docs/implementation/arch-analysis-2026-05-20-2124/temp/validation-04-final-report.md b/docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/validation-04-final-report.md similarity index 100% rename from docs/implementation/arch-analysis-2026-05-20-2124/temp/validation-04-final-report.md rename to docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/validation-04-final-report.md diff --git a/docs/implementation/arch-analysis-2026-05-20-2124/temp/validation-option-g.md b/docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/validation-option-g.md similarity index 100% rename from docs/implementation/arch-analysis-2026-05-20-2124/temp/validation-option-g.md rename to docs/archive/working-notes/implementation-arch-analysis-2026-05-20-2124/validation-option-g.md diff --git a/docs/clarion/1.0/detailed-design.md b/docs/clarion/1.0/detailed-design.md index 2c7db6b9..8b81f6c2 100644 --- a/docs/clarion/1.0/detailed-design.md +++ b/docs/clarion/1.0/detailed-design.md @@ -66,36 +66,13 @@ language_id: python kinds: function: { leaf: true, searchable: true, has_callers: true } class: { leaf: true, searchable: true, has_members: true } - protocol: { leaf: true, searchable: true, subtype_of: class } - enum: { leaf: true, searchable: true, subtype_of: class } - typed_dict: { leaf: true, searchable: true, subtype_of: class } - global: { leaf: true, searchable: true } - decorator: { leaf: true, searchable: true, subtype_of: function } - type_alias: { leaf: true, searchable: true, subtype_of: global } - module: { leaf: false, searchable: true, contains: [function, class, global, ...] } - package: { leaf: false, searchable: true, contains: [module, package] } + module: { leaf: false, searchable: true, contains: [function, class] } edges: contains: { cardinality: one_to_many } calls: { cardinality: many_to_many, weighted: true } - imports: { cardinality: many_to_many } - inherits_from: { cardinality: many_to_many } - implements: { cardinality: many_to_many } - decorated_by: { cardinality: many_to_many } references: { cardinality: many_to_many } - -tags: - - entry_point - - http_route - - cli_command - - data_model - - config_loader - - test_function - - test_class - - fixture - - deprecated - - has_wardline_annotation - - is_dependency_hub + imports: { cardinality: many_to_many } capabilities: parse_files: true @@ -107,17 +84,8 @@ capabilities: # approximate (name-match) for method calls; # no dynamic dispatch (see "Call graph precision" # under Python plugin specifics below) - inherits_from: { supported: true, confidence_basis: ast_match } - decorated_by: { supported: true, confidence_basis: ast_match } - annotation_detection: - # All 17 Wardline groups are decorator-based (verified against - # wardline.core.registry.REGISTRY: 42 canonical names + 1 legacy alias). - # Supplementary manifest declarations exist only for Groups 1 and 17 - # via overlay boundaries[]; those are ingested in Phase 7 and augment - # (don't replace) decorator detection. - wardline_groups: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17] - wardline_overlay_groups: [1, 17] # boundaries[] in overlay.schema.json — additive - wardline_registry_import: "wardline.core.registry:REGISTRY" # direct Python import; see §9 + references: { supported: true, confidence_basis: name_match } + annotation_detection: false structural_findings: true factual_findings: true @@ -140,34 +108,28 @@ prompt_templates: ### Error handling posture -- Plugin crash during batch → core records which files completed (by completed `file_analyzed` messages); unfinished files logged as `CLA-INFRA-PLUGIN-CRASH`; partial run manifest written. Crash-loop circuit breaker (>3 crashes in 60s) halts the plugin permanently for the run. +- Plugin crash during batch → core records files whose `analyze_file` responses have already crossed the bounded internal file-batch channel into the writer actor; unfinished files are represented by `CLA-INFRA-PLUGIN-CRASH` / plugin-specific findings and the run row ends `failed` with partial committed rows. Crash-loop circuit breaker (>3 crashes in 60s) halts the plugin permanently for the run. - Plugin timeout on one file (default 30s, configurable per-plugin) → emit `CLA-INFRA-PLUGIN-TIMEOUT`; skip; continue. - Plugin malformed JSON or framing error → core logs, skips the message; continues. Never crashes on plugin misbehaviour. - No silent fallbacks. Every skip/error produces a finding. - **stdout hygiene**: the plugin protocol reserves stdout for framed JSON-RPC. Plugin authors must redirect logging to stderr; a stray `print()` breaks the stream. This is documented as a plugin-author requirement and enforced by the reference Python client. -### Python plugin specifics — decorator-detection table +### Python plugin specifics — decorator policy -| Case | Pattern | Policy | -|---|---|---| -| Direct named | `@validates_shape` | Exact name match against `wardline.core.registry.REGISTRY` (the 42 canonical names). Imported directly; see §9 prerequisite on `REGISTRY_VERSION` pinning. | -| Factory | `@validates_shape("User")` | Match on the callee name; arguments captured into `decorated_by.properties.args` for Wardline to consume | -| Stacked | `@a\n@b\ndef f():` | Emits edges in source order; ordering preserved in `decorated_by.properties.stack_index` (Wardline semantics depend on this) | -| Function decorator | `@register\ndef f():` | Fully supported — matches Wardline's own scanner coverage | -| Class decorator | `@register\nclass C:` | Recorded as decorator edges. **Clarion-side augmentation**: Wardline's own scanner does not visit class-level decoration (`scanner/discovery.py:_walk_functions` only walks FunctionDef/AsyncFunctionDef). Clarion findings derived from class-decoration carry `confidence_basis: "clarion_augmentation"` to distinguish them from Wardline-authoritative claims. | -| Aliased | `validates = validates_shape` | Detected when `validates_shape` is imported and aliased within a module; edge annotated `via_alias: true` | -| Dotted single-level | `@wardline.validates_shape` | Edge records `(module, name)` pair; matched against `REGISTRY` | -| Dotted call | `@app.route("/health")` | Edge records full call chain (`app.route`); tag lookup resolves `app.route` via Wardline's annotation descriptor (see Appendix A — deferred to v0.2) | -| Legacy alias | `@tier_transition` | Resolved via Wardline's `LEGACY_DECORATOR_ALIASES` (`core/registry.py:12-14`); canonical name (`trust_boundary`) recorded in `decorated_by.properties.canonical_name`; `via_legacy_alias: true` | - -Cases the plugin does *not* claim to handle (and emits `CLA-PY-ANNOTATION-AMBIGUOUS` for): dynamic decorator selection (`if cond: deco = a else b`), runtime-computed attribute decoration, decorators applied via `__init_subclass__`, arbitrary dotted chains (`a.b.c`), star imports (refused by Wardline itself), metaclass-based decoration, lambdas/subscripts as decorators. These are shared blind spots with Wardline's scanner. +Decorator semantics are deferred in v1.0. The Python extractor uses decorator +line ranges to make definition spans and source navigation cover the decorated +declaration, but it does not declare `decorated_by`, Wardline tags, Wardline +annotation metadata, decorator arguments, or alias-resolved decorator facts. +Future decorator extraction must first extend the plugin manifest and then add +fixture coverage for direct, factory, stacked, class, dotted, aliased, and +legacy Wardline decorator forms. ### Python plugin specifics — import resolution -- `sys.path` discovery: via the `python_executable` declared in `plugins.toml` (default: the pipx venv) invoked as `python -m site --user-site` plus project-local `PYTHONPATH`. Virtualenvs in `/.venv/` or `/venv/` detected automatically; user can override via `clarion.yaml:analysis.python.sys_path`. -- Unresolvable imports: emit `CLA-PY-UNRESOLVED-IMPORT` finding (kind: fact, severity: INFO); create a **stub** entity with id `python:unresolved:` and an `imports` edge to it. Stubs are reconciled against real entities if the import becomes resolvable in a later run; never promoted silently. -- Re-exports: definition site wins. `__init__.py` re-exports produce an `alias_of` edge from the re-export entity to the defining entity (not a new top-level entity). Entity IDs reference the definition site; consult tools resolve aliases transparently. -- Conditional imports: `TYPE_CHECKING` blocks are extracted as `imports` edges with `type_only: true` property — the edge exists so `goto`/search finds them, but graph algorithms (circular-import detection, coupling hotspots) filter them out. `try/except ImportError` blocks: first branch wins (the one that represents the "normal" path); fallback branches emit `CLA-FACT-CONDITIONAL-IMPORT` findings. `if sys.version_info`: all branches union. +- `imports` edges are emitted from `import` and `from ... import` statements by the AST walk. +- Relative imports are normalized against the current module; `__init__.py` collapses to the package module name. +- `TYPE_CHECKING` blocks are extracted as `imports` edges with `type_only: true`; function-local imports carry `scope: "function"`. Graph algorithms that need runtime imports filter those properties. +- The v1.0 plugin does not emit `alias_of` edges for `__init__.py` re-exports, does not create `python:unresolved:` stubs, and does not emit conditional-import findings for `try/except ImportError` or `sys.version_info` branches. ### Python plugin specifics — call graph precision @@ -231,11 +193,11 @@ struct Entity { - **Files**: `core:file:{qualified_name}` where `qualified_name` is the project-relative POSIX canonical path. File IDs may not contain `@`; content hashes are carried as drift metadata, not embedded in the ID. - **Subsystems**: `core:subsystem:{cluster_hash}` (from sorted member module IDs). - **Guidance sheets**: `core:guidance:{content_hash_short}`. -- **Unresolvable Python imports** (stub): `python:unresolved:{module.path}` — reconciled to real entities when resolution becomes possible. +- **Python import targets**: `imports` edges point at syntactic target module IDs such as `python:module:pkg.mod`. The v1.0 plugin does not create a separate `python:unresolved:{module.path}` stub namespace. ### Canonical-name policy (Python) -- **Definition site wins**. `TokenManager` defined in `auth/tokens.py` and re-exported from `auth/__init__.py` has ID `python:class:auth.tokens::TokenManager`. The re-export entity is an `alias_of` edge, not a new top-level entity. Consult tools resolve aliases transparently. +- **Definition site wins for emitted definitions**. `TokenManager` defined in `auth/tokens.py` has ID `python:class:auth.tokens.TokenManager`. The v1.0 plugin collapses `__init__.py` to the package module name, but it does not emit `alias_of` edges for package re-exports. - **`src.` prefix stripped**. Projects using `src/` layout get `src.auth.tokens` → `auth.tokens` canonicalisation; policy configurable via `clarion.yaml:analysis.python.canonical_root` (default: auto-detect from `pyproject.toml` `[tool.setuptools.packages.find]` or `[project.optional-dependencies]`). - **Test and script modules** keep their on-disk module path (no canonicalisation) because they lack a deterministic install path. @@ -914,7 +876,7 @@ CREATE INDEX ix_sei_lineage_sei ON sei_lineage(sei); - WAL mode: `PRAGMA journal_mode = WAL`, `synchronous = NORMAL`, `busy_timeout = 5000ms`, `wal_autocheckpoint = 1000` (default). - `clarion analyze` and `clarion serve` each instantiate exactly one **writer actor** task (a `tokio::task` owning a dedicated `rusqlite::Connection`). All mutations route through a bounded `mpsc::Sender` with backpressure. There is no in-process write contention; there are no cross-process writers because `clarion analyze` and `clarion serve` don't run the same DB concurrently in v0.1 (see operational posture below). -- **Transaction scope**: `clarion analyze` commits on a rolling boundary of **N files per transaction** (default `N=50`, configurable via `clarion.yaml:storage.tx_batch_size`). This keeps the WAL bounded, lets checkpointing run between batches, and makes `--resume` checkpoints meaningful. A full-batch single transaction is explicitly not used. +- **Transaction scope**: `clarion analyze` commits on a rolling boundary of **N writes per transaction** (default `N=50`). This keeps the WAL bounded and lets SQLite checkpointing run between batches. Per ADR-041, v1.x `--resume` is idempotent same-run re-emit, not durable phase/file checkpoint recovery. A full-batch single transaction is explicitly not used. - **Consult-mode writes** (summary cache, session state) during `clarion serve` are dispatched on the same writer actor; they interleave with analyze-time writes if a user starts `clarion analyze` against a running `clarion serve` (not recommended but survivable). Writes are applied in arrival order; no starvation because consult writes are tiny and sparse. - **Readers** (plugin processes, MCP tool calls, HTTP API handlers, the markdown renderer) open read-only `rusqlite` connections from a `deadpool-sqlite` pool (configurable max: default 16). WAL lets them read against the committed snapshot without blocking writers. - **Checkpointing**: truncate-mode checkpoint issued after each 10 analyze-transactions or after `clarion analyze` completes, whichever comes first. @@ -945,7 +907,7 @@ CREATE INDEX ix_sei_lineage_sei ON sei_lineage(sei); config.yaml # snapshot of clarion.yaml at run time log.jsonl # per-run log stats.json # run statistics - partial.json # present if run ended partial + partial.json # not part of the v1.x resume contract (ADR-041) ~/.config/clarion/ # user-level providers.toml # API keys, model tier mappings @@ -1149,6 +1111,8 @@ These rules combine signals Clarion uniquely holds — clusters from Phase 3, Wa | `CLA-FACT-TIER-SUBSYSTEM-MIXING` | WARN | heuristic | A subsystem has members declared across disagreeing tiers (e.g., 11 members `INTEGRAL`, 3 `GUARDED`). Either a misclassification Wardline can't see or a latent tier boundary worth naming. Emitted against the subsystem entity with `related_entities` listing the outliers. | | `CLA-FACT-ENTITY-DELETED` | INFO | deterministic | Entity present in the previous run's catalog is absent in this run. Compared against prior run's entity set at Phase-7 boundary. Emitted per deleted entity. Surfaces silently orphaned Filigree issues, silently-no-op guidance sheets, and persistent-until-TTL cache rows. | | `CLA-FACT-SUBSYSTEM-TIER-UNANIMOUS` | INFO (fact) | deterministic | Subsystem members share a uniform declared tier. Useful positive signal for tier-consistency reports; cheap companion to the mixing rule. | +| `CLA-FACT-GUIDANCE-EXPIRED` | INFO | deterministic | A guidance sheet's `expires` instant is in the past. The read path already excludes expired sheets from composition; this surfaces the state operatively (the sheet is not deleted). Emitted once per expired sheet, anchored to the sheet, on every analyze (independent of deletions/SEI). | +| `CLA-FACT-GUIDANCE-CHURN-STALE` | WARN | heuristic (0.7) | A guidance sheet covers high-churn code: the aggregate `git_churn_count` over the sheet's matched entities meets the staleness threshold (asymmetric — 20 for `pinned` sheets, 50 otherwise). Anchored to the sheet with matched entities as `related_entities`. Inert until the churn-history pipeline populates `git_churn_count` (proxy for the design's true "churn since `authored_at`/`reviewed_at`" delta). | **Why these belong in Phase 7, not the plugin's Phase-1 emission**: the rules depend on clustering output (Phase 3) and prior-run state, which are core-side concerns. Emitting them from the plugin would require the plugin to know about subsystems and prior runs — violation of Principle 3. @@ -1816,7 +1780,7 @@ The v0.1 core is Rust (locked — see §11 ADR-001). Crate choices below are rec - Content-Length framed JSON-RPC 2.0 (§1). Implementation: `tower-lsp-server`-derived framing or hand-rolled (simple enough — ~80 lines). Use `serde_json` for payload encoding; `jsonrpc-core` is over-featured for our two-endpoint protocol. - **tokio::process::Child** for plugin subprocess lifecycle. Explicit `wait()` to reap zombies. SIGPIPE handling (Unix only): ignore the signal so a dead plugin doesn't crash the core when we write to its stdin. -- **Bounded `tokio::sync::mpsc`** for the `file_analyzed` stream from plugin → core. Backpressure cap: default 100 messages. Prevents a runaway plugin from OOMing the core. +- **Bounded `tokio::sync::mpsc`** for the completed-file handoff from the blocking plugin worker → async writer loop. Backpressure cap: default 100 messages. Prevents a runaway plugin session from OOMing the core while preserving a simple request/response plugin wire protocol. - **Crash-loop circuit breaker**: >3 plugin crashes in 60 seconds → permanently disable the plugin for the run and emit `CLA-INFRA-PLUGIN-DISABLED-CRASH-LOOP`. - **stdout hygiene**: documented plugin-author requirement (§1 Error handling). Reference Python client redirects `logging.basicConfig(stream=sys.stderr)` at import time. @@ -1896,7 +1860,7 @@ Revision 5 (2026-04-17) restructures the single design document into a three-lay | §1 headline: suite fabric does not exist yet | Reframed v0.1 scope as "weaving the Loom fabric"; Abstract rewritten; new §11 Suite Bootstrap | System-design, §9 | | §3.1 Wire format wrong — `properties`→`metadata`, `line`→`line_start/line_end` | Finding struct and wire-format block rewritten; full example JSON emitted | §2, §7 | | §3.2 Severity enum wrong — wire values `{critical,high,medium,low,info}` | Mapping table added; internal vocabulary preserved in `metadata.clarion.internal_severity`; `warnings[]` response inspection required | §7 | -| §3.3 Wardline groups 9/12/13 are decorator-based (Rev 2 fix was wrong) | All 17 groups listed under `wardline_groups`; `wardline_overlay_groups: [1, 17]` notes supplementary overlay declarations | §1 manifest | +| §3.3 Wardline groups 9/12/13 are decorator-based (Rev 2 fix was wrong) | v1.0 no longer advertises Wardline group detection in the Python manifest; decorator/Wardline ontology is deferred until extractor support exists | §1 manifest | | §3.4 Wardline tier vocabulary uses `INTEGRAL/ASSURED/GUARDED/EXTERNAL_RAW`, not T1–T4 | Glossary and WardlineMeta struct corrected; `declared_tier: Option` | §2, Appendix B | | §3.5 Wardline has no Filigree integration today | SARIF→Filigree translator (Clarion-side `clarion sarif import`) specified in §7; §9.2 documents Wardline prerequisite | §7, §9.2 | | §3.6 `registry_backend` flag does not exist in Filigree | ADR-014 added; §9.1 names the schema surgery (4 NOT-NULL FKs, 3 auto-create paths, 5–8 hot files); degraded-mode fallback in System-design §11 | §7, §9.1, ADR-014 | diff --git a/docs/clarion/1.0/requirements.md b/docs/clarion/1.0/requirements.md index c929a0ae..34ac09d5 100644 --- a/docs/clarion/1.0/requirements.md +++ b/docs/clarion/1.0/requirements.md @@ -149,11 +149,24 @@ Within a single phase, LLM calls execute in parallel up to a configurable cap (d #### REQ-ANALYZE-03 — Resumable after crash or interrupt -`clarion analyze --resume ` continues from the last successful phase checkpoint, skipping already-completed LLM work via content-hash caching. - -**Rationale**: At elspeth scale a full run takes ~30-40 minutes and costs ~$15. A crash at 80% completion must not waste the prior 24 minutes / $12 of LLM spend. Checkpointing on phase transitions is coarse enough to avoid write amplification but fine enough to bound re-work. -**Verification**: Interrupt (SIGINT) `clarion analyze` during Phase 5; resume with `--resume`; assert phases 1-4 are skipped and Phase 5 picks up at the last completed entity. -**See**: System Design §6 (Analysis Pipeline, Resumability). +> **v1.x status amended by ADR-041.** `clarion analyze --resume ` +> reopens the existing run row, re-walks idempotently, and emits Filigree +> findings with `mark_unseen=false`. It does **not** promise durable +> phase/file checkpoint recovery or skipped provider calls. + +`clarion analyze --resume ` is a same-run repair path: it preserves +Filigree seen/unseen semantics while safely repeating deterministic writes. +Checkpoint recovery is deferred until a successor ADR defines the durable +checkpoint schema and provider-call accounting. + +**Rationale**: The shipped run-lifecycle contract is coherent and necessary for +federated finding emission. Durable checkpoint recovery is a separate scheduler +feature with plugin-session, import-edge, and provider-side-effect ordering +requirements. +**Verification**: Resume a prior run id and assert the row is reopened, writes +are idempotent, and Filigree emission uses `mark_unseen=false`. Do not assert +phase/file skipping unless a successor checkpoint ADR lands. +**See**: ADR-041; System Design §6 (Analysis Pipeline, Resumability). #### REQ-ANALYZE-04 — Deletion detection via entity-set diff @@ -547,34 +560,51 @@ Each plugin ships a manifest declaring its entity kinds, edge kinds, tags, capab #### REQ-PLUGIN-03 — Lifecycle methods (analyze, build_prompt) -Plugins implement two phases of lifecycle calls: batch (`initialize`, `file_list`, `analyze_file(path) → stream of entities + edges + findings`) and consult (`build_prompt(entity_id, query_type, context)`). Calls are JSON-RPC methods; streams use a separate `file_analyzed` notification channel. +Plugins implement two phases of lifecycle calls: batch (`initialize`, `analyze_file(path) -> {entities[], edges[], stats}`) and consult (`build_prompt(entity_id, query_type, context)`). Calls are Content-Length framed JSON-RPC request/response methods. The plugin returns one result per file; the core streams each completed file result through a bounded internal file-batch channel to the writer actor. -**Rationale**: Splitting lifecycle into batch and consult lets plugins optimise each independently — batch is throughput-oriented; consult is latency-oriented. Streaming from `analyze_file` lets the core commit entities incrementally rather than buffering a whole file's worth. -**Verification**: Fixture plugin responds to each method; streaming behaviour commits entities as they arrive. +**Rationale**: Splitting lifecycle into batch and consult lets plugins optimise each independently — batch is throughput-oriented; consult is latency-oriented. Keeping the wire protocol request/response makes plugin authorship simple, while the bounded internal handoff applies writer backpressure and commits completed file output incrementally instead of buffering a whole plugin run. +**Verification**: Fixture plugin responds to each method; a plugin that emits one file successfully and crashes on a later file leaves the completed file's rows durable and marks the run failed. **See**: System Design §2 (Core / Plugin Architecture, Lifecycle). -#### REQ-PLUGIN-04 — Python plugin (v0.1) +#### REQ-PLUGIN-04 — Python plugin (v1.0) -Clarion ships a Python plugin supporting Python ≥3.11 that extracts functions, classes, protocols, globals, modules, packages, and their edges (`imports`, `calls`, `inherits_from`, `decorated_by`, `uses_type`, `alias_of`). Installation via `pipx install clarion-plugin-python` for isolation. +Clarion ships a Python plugin supporting Python >=3.11. The v1.0 plugin +declares and emits the narrower ontology that is present in +`plugins/python/plugin.toml`: `function`, `class`, and `module` entities, plus +`contains`, `calls`, `references`, and `imports` edges. The plugin is not +Wardline-aware in v1.0 and does not declare or emit `protocol`, `global`, or +`package` entity kinds, nor `inherits_from`, `decorated_by`, `uses_type`, or +`alias_of` edges. Those signals are deferred until the manifest declares them +and the extractor has fixture-backed support for them. -**Rationale**: Python is the validating first-customer language (elspeth is ~425k LOC Python). Shipping the plugin alongside the core for v0.1 establishes the plugin-authoring contract and validates the plugin protocol against a real workload; `pipx` isolation prevents venv conflicts with the analysed project. -**Verification**: `tests/fixtures/elspeth-slice/` runs through the Python plugin and produces expected entity/edge counts; installation via pipx succeeds. +**Rationale**: Python is the validating first-customer language (elspeth is ~425k LOC Python). Shipping the plugin alongside the core for v1.0 establishes the plugin-authoring contract and validates the plugin protocol against a real workload, while keeping the advertised ontology limited to signals that are actually emitted. `pipx` isolation prevents venv conflicts with the analysed project. +**Verification**: The plugin manifest smoke test pins the v1.0 entity/edge kinds and `wardline_aware = false`; `tests/fixtures/elspeth-slice/` runs through the Python plugin and produces expected entity/edge counts; installation via pipx succeeds. **See**: System Design §2 (Python plugin specifics). #### REQ-PLUGIN-05 — Python import resolution policy -The Python plugin resolves imports per a declared policy: `sys.path` discovered via virtualenv introspection or user-supplied `python_executable`; `src.` prefix stripped by default; `__init__.py` re-exports become `alias_of` edges (definition site wins); unresolvable imports produce `python:unresolved:{module.path}` placeholder entities; `TYPE_CHECKING` blocks excluded from runtime-import edges. +The Python plugin emits `imports` edges from `import` and `from ... import` +statements using the standard-library AST. Relative imports are normalized +against the current module, `__init__.py` collapses to the package module name, +and `TYPE_CHECKING`-guarded imports carry `properties.type_only = true` while +function-local imports carry `properties.scope = "function"` so graph algorithms +can filter non-runtime edges. The v1.0 plugin does not emit `alias_of` edges for +package re-exports and does not mint `python:unresolved:*` placeholder entities. -**Rationale**: Python's import model is the single hardest static-analysis problem at elspeth scale; leaving it undefined produces an entity graph that is subtly wrong in different ways on different installations. An explicit policy makes the behaviour testable and predictable. -**Verification**: Fixture with each import shape produces the documented resolution; `TYPE_CHECKING` imports don't generate spurious circular-import findings. +**Rationale**: Python's import model is the single hardest static-analysis problem at elspeth scale; leaving it undefined produces an entity graph that is subtly wrong in different ways on different installations. An explicit, narrower v1.0 policy makes the behaviour testable and predictable without advertising resolver features that are not implemented. +**Verification**: Fixture with each import shape produces the documented target and properties; `TYPE_CHECKING` and function-local imports do not generate spurious circular-import findings. **See**: System Design §2 (Python plugin specifics, Import resolution). #### REQ-PLUGIN-06 — Decorator detection policy -The Python plugin detects decorators including factory invocations (`@app.route("/health")`), stacked decorators (preserving order — matters for Wardline semantics), class decorators, and aliases (`validates = validates_shape`). Each decoration produces a `decorated_by` edge with optional `properties` capturing decorator arguments. +Decorator semantics are deferred for the Python plugin until the ontology grows. +In v1.0 the extractor preserves decorator source spans in entity definition +metadata so source navigation covers the decorated declaration, but it does not +declare or emit `decorated_by` edges, decorator tags, Wardline annotation +metadata, decorator arguments, or alias-resolved decorator facts. -**Rationale**: Decorator-as-DSL is widespread in Python (FastAPI, Pydantic, Wardline itself). Naive direct-name matching misses most decorator usage; explicit handling makes the entity metadata faithful to what the code actually declares. -**Verification**: Fixture with each decorator shape produces the expected `decorated_by` edges with preserved argument metadata. +**Rationale**: Decorator-as-DSL is widespread in Python (FastAPI, Pydantic, Wardline itself). The v1.0 plugin keeps source spans honest while avoiding a false ontology claim. A later implementation must add the edge kind to the manifest and fixture-backed extraction for direct, factory, stacked, class, and aliased decorators before advertising decorator semantics. +**Verification**: Current fixtures assert decorated entity spans include decorator lines and that the manifest does not advertise Wardline semantics. Future decorator extraction must add fixtures for direct, factory, stacked, class, and aliased decorators and assert emitted `decorated_by` edges with preserved order/arguments. **See**: System Design §2 (Python plugin specifics, Decorator detection). --- @@ -609,7 +639,8 @@ For Filigree `registry_backend: clarion`, Clarion's HTTP read API is loopback-only by default. Non-loopback binds require **both** `serve.http.allow_non_loopback: true` **and** a resolved authentication secret — either HMAC identity via `serve.http.identity_token_env` (preferred per -ADR-034) or a legacy bearer token via `serve.http.token_env`. A non-loopback +ADR-034, hardened with timestamp/nonce replay protection by ADR-042) or a +legacy bearer token via `serve.http.token_env`. A non-loopback bind with the opt-in but no resolved secret is refused at startup with `CLA-CONFIG-HTTP-NO-AUTH`. The loopback-without-token mode remains unauthenticated and emits a startup warning that any local process can read the @@ -628,6 +659,8 @@ the non-loopback HMAC-required path (line 1579), and the non-loopback legacy-bearer path (line 1614). The loopback startup-warning surface is covered by config-layer tests. **See**: System Design §9 (Integrations, HTTP Read API), ADR-014, ADR-034. +For the exact HMAC header and canonical-message shape, use +`docs/federation/contracts.md` §Authentication. #### REQ-HTTP-04 — ETag-style response caching @@ -999,10 +1032,18 @@ The dry-run cost estimate is within ±50% of actual spend on representative proj #### NFR-RELIABILITY-01 — Crash-surviving store -`.clarion/clarion.db` survives unclean shutdown (SIGKILL during analyze) without corruption. Subsequent `clarion analyze --resume ` continues from the last checkpoint. - -**Rationale**: SQLite WAL + writer-actor per-N-files transactions + checkpoint discipline produce crash-safe semantics when configured correctly. Getting this right in v0.1 is non-negotiable — a corrupt store costs the user everything. -**Verification**: Test harness: `clarion analyze` → `kill -9` mid-run → next invocation loads the DB cleanly → `--resume` continues. +> **v1.x status amended by ADR-041.** `.clarion/clarion.db` must survive +> unclean shutdown (SIGKILL during analyze) without corruption. Subsequent +> `clarion analyze --resume ` safely reopens and re-walks the same run +> id; it does not continue from a phase/file checkpoint. + +**Rationale**: SQLite WAL + writer-actor transactions protect the store from +corruption. Same-run re-emit protects federated finding lifecycle semantics; +skipping already completed work is deferred checkpoint-recovery behavior, not +the v1.x guarantee. +**Verification**: Test harness: `clarion analyze` → `kill -9` mid-run → next +invocation opens the DB cleanly → `--resume ` reopens the row and can +complete or fail deterministically without Filigree unseen flapping. **See**: System Design §4 (Storage, Concurrency). #### NFR-RELIABILITY-02 — Degraded modes for missing siblings diff --git a/docs/clarion/1.0/system-design.md b/docs/clarion/1.0/system-design.md index ed7854ec..3775c2a7 100644 --- a/docs/clarion/1.0/system-design.md +++ b/docs/clarion/1.0/system-design.md @@ -190,9 +190,9 @@ Each plugin declares its ontology at startup. The manifest is the contract betwe Key fields: - `plugin_id` — the namespace for this plugin's emissions (e.g., `python`, `java`, `core`) -- `kinds` — every entity kind the plugin can emit (`function`, `class`, `protocol`, `global`, `module`, `package`, ...) -- `edge_kinds` — plugin-defined edge kinds (`imports`, `calls`, `inherits_from`, `decorated_by`, `uses_type`, `alias_of`, ...). Core reserves `contains`, `guides`, `emits_finding`, `in_subsystem`. -- `tags` — declared tag vocabulary (including Wardline annotation names for the Python plugin) +- `kinds` — every entity kind the plugin can emit; the v1.0 Python plugin declares only `function`, `class`, and `module` +- `edge_kinds` — plugin-defined edge kinds; the v1.0 Python plugin declares only `contains`, `calls`, `references`, and `imports`. Core reserves `guides`, `emits_finding`, `in_subsystem`, and core-owned containment/source anchors. +- `tags` — declared tag vocabulary - `capabilities` — boolean flags per capability (`calls`, `imports`, `inherits_from`, ...) with `confidence_basis` per capability (`ast_match`, `name_match`, `clarion_augmentation`, ...) - `supported_rule_ids` — rule IDs this plugin may emit, namespaced by prefix - `prompt_templates` — list of named templates (`python:class:v1`, `python:module:v1`, ...) with per-segment slot specifications @@ -210,30 +210,21 @@ hash-pinning are deferred (NG-16). The Python plugin is the v0.1 validating plugin and the reference implementation of the manifest contract. -**Parser dispatch**. The plugin parses with the standard-library `ast` module — there is **no tree-sitter and no LibCST dependency**. Structural extraction (`function` / `class` / `module` entities, qualnames, `imports` and decorator edges) walks the `ast` tree directly. Call-graph and reference resolution is delegated to a managed **pyright** subprocess session (`PyrightSession`, recycled every 25 files), which serves as both the call resolver and the reference resolver. There is no `CLA-PY-PARTIAL-PARSE` fallback path. +**Parser dispatch**. The plugin parses with the standard-library `ast` module — there is **no tree-sitter and no LibCST dependency**. Structural extraction (`function` / `class` / `module` entities, qualnames, and `imports` edges) walks the `ast` tree directly. Decorator source ranges are retained in entity definition metadata, but decorator semantics are not emitted as edges in v1.0. Call-graph and reference resolution is delegated to a managed **pyright** subprocess session (`PyrightSession`, recycled every 25 files), which serves as both the call resolver and the reference resolver. There is no `CLA-PY-PARTIAL-PARSE` fallback path. -**Import resolution** (REQ-PLUGIN-05). Resolution uses an explicit policy: -- `sys.path` discovered via virtualenv introspection (`python -m site` against the supplied `python_executable` from `plugins.toml`) or, if none supplied, the ambient Python -- `src.` prefix stripped by default; configurable via `clarion.yaml:analysis.python.canonical_root` -- `__init__.py` re-exports resolve to the definition site (which wins for the canonical ID). The plugin does **not** emit a separate `alias_of` edge in v1.0 — the kind is reserved in the manifest but unused. -- Calls and references that pyright cannot resolve are recorded as **unresolved call / reference sites** (counted in run stats; unresolved *call* sites persist to `entity_unresolved_call_sites` for query-time inferred dispatch) rather than dropped. The plugin does **not** mint `python:unresolved:*` placeholder entities. -- `imports` edges are emitted from `import` / `from … import` statements via the `ast` walk; `TYPE_CHECKING`-guarded imports are **not** excluded in v1.0 (the type-only-import filter is not implemented). +**Import extraction** (REQ-PLUGIN-05). `imports` edges are emitted from `import` / `from ... import` statements via the `ast` walk. Relative imports are normalized against the current module, `__init__.py` collapses to the package module name, `TYPE_CHECKING`-guarded imports carry `properties.type_only = true`, and function-local imports carry `properties.scope = "function"`. Graph algorithms filter those properties when they need runtime-only imports. Calls and references that pyright cannot resolve are recorded as **unresolved call / reference sites** (counted in run stats; unresolved *call* sites persist to `entity_unresolved_call_sites` for query-time inferred dispatch) rather than dropped. The plugin does **not** mint `python:unresolved:*` placeholder entities and does not emit `alias_of` edges for package re-exports in v1.0. -**Decorator detection** (REQ-PLUGIN-06). Direct-name match is insufficient — real Python code uses decorator factories (`@app.route("/health")`), stacked decorators (order matters), class decorators, and aliases (`validates = validates_shape`). Detection handles: -- Factory invocations: record the decorator name + call arguments as edge properties -- Stacked decorators: preserve order in a `decoration_order` edge property (matters for Wardline semantics) -- Aliases: follow name resolution; an alias to a Wardline-registered decorator counts as that decorator -- Legacy renames: consult Wardline's `LEGACY_DECORATOR_ALIASES` (from the directly-imported REGISTRY) to avoid double-counting old-name + new-name +**Decorator detection** (REQ-PLUGIN-06). Decorator semantics are deferred in v1.0. The extractor includes decorator lines in function/class source spans for navigation, but the manifest does not declare `decorated_by`, Wardline tags, or Wardline-aware capability, and the plugin emits no decorator edges, decorator arguments, or alias-resolved decorator facts. **Serial-or-parallel posture**. v0.1 is serial within the plugin (one file at a time). Parallelism happens at the core level (multiple plugins, multiple LLM calls). Parallelism inside the plugin is deferred to v0.2. ### Observe-vs-enforce boundary (Principle 5) -The Python plugin detects *that* a Wardline annotation is present on a function (e.g., `@validates_shape`); Wardline's enforcer determines *whether* the function actually validates what it claims. Clarion tags the entity with `wardline.groups` / `wardline.annotations`; Wardline's findings (surfaced via Filigree) fill in the "does it actually comply?" answer. +The Python plugin is not Wardline-aware in v1.0. Future Wardline-aware extraction may detect *that* a Wardline annotation is present on a function (e.g., `@validates_shape`), while Wardline's enforcer remains responsible for deciding *whether* the function actually validates what it claims. Clarion must not tag entities with `wardline.groups` / `wardline.annotations` until the manifest advertises that capability and the extractor emits the corresponding signals. This boundary is preserved by: -- Clarion importing Wardline's REGISTRY directly rather than redefining the decorator vocabulary (`CON-WARDLINE-01`) -- Clarion's plugin declaring Wardline decorator names in its manifest `tags`, but not re-implementing Wardline's rule logic +- Clarion not redefining Wardline's decorator vocabulary in v1.0 +- Clarion's plugin keeping `wardline_aware = false` until it emits usable Wardline-derived signals - `CLA-FACT-TIER-SUBSYSTEM-MIXING` (structural observation) being core-emitted (uses clustering) — Clarion is flagging that tiers disagree; Wardline would be the tool that decides *which* tier is correct --- @@ -438,7 +429,12 @@ flowchart LR ### Crash safety -SQLite WAL + per-N-files transactions + explicit `PRAGMA synchronous=NORMAL` give crash-safe semantics: a SIGKILL during analyze loses at most the last N-files batch. `clarion analyze --resume ` reads the run's checkpoint file (`runs//checkpoints.jsonl`) and continues from the last clean phase boundary. +SQLite WAL + writer-actor transactions + explicit `PRAGMA synchronous=NORMAL` +give crash-safe storage semantics: a SIGKILL during analyze must not corrupt +`.clarion/clarion.db`, and committed rows survive. Per ADR-041, v1.x +`clarion analyze --resume ` reopens the existing run id and re-walks +idempotently; it does not read `checkpoints.jsonl` or continue from a phase/file +checkpoint. ### Git-friendly storage @@ -591,18 +587,21 @@ flowchart TB Phase0 --> Phase1 --> Phase15 --> Phase2 --> Phase3 --> Phase4 --> Phase5 --> Phase6 --> Phase7 --> Phase8 - Phase0 -.->|"checkpoint"| CheckPt0([checkpoint]) - Phase1 -.->|"checkpoint"| CheckPt1([checkpoint]) - Phase15 -.->|"checkpoint"| CheckPt15([checkpoint]) - Phase2 -.->|"checkpoint"| CheckPt2([checkpoint]) - Phase3 -.->|"checkpoint"| CheckPt3([checkpoint]) - Phase4 -.->|"checkpoint"| CheckPt4([checkpoint]) - Phase5 -.->|"checkpoint"| CheckPt5([checkpoint]) - Phase6 -.->|"checkpoint"| CheckPt6([checkpoint]) - Phase7 -.->|"checkpoint"| CheckPt7([checkpoint]) + Phase0 -.->|"ADR-041: no v1.x checkpoint resume"| CheckPt0([same-run re-emit]) + Phase1 -.->|"idempotent writes"| CheckPt1([same-run re-emit]) + Phase15 -.->|"idempotent writes"| CheckPt15([same-run re-emit]) + Phase2 -.->|"idempotent writes"| CheckPt2([same-run re-emit]) + Phase3 -.->|"idempotent writes"| CheckPt3([same-run re-emit]) + Phase4 -.->|"cache may avoid work"| CheckPt4([same-run re-emit]) + Phase5 -.->|"cache may avoid work"| CheckPt5([same-run re-emit]) + Phase6 -.->|"cache may avoid work"| CheckPt6([same-run re-emit]) + Phase7 -.->|"idempotent writes"| CheckPt7([same-run re-emit]) ``` -Each phase transition writes a checkpoint to `runs//checkpoints.jsonl`. `clarion analyze --resume ` reads the checkpoint file and resumes from the last clean phase boundary. Content-hash caching means resumed runs naturally skip unchanged-file work. +Per ADR-041, v1.x phase transitions do not write a durable checkpoint file. +`clarion analyze --resume ` reuses the run id, re-walks safely, and +relies on existing caches where they independently apply. A future checkpoint +ADR may reintroduce phase/file skipping with explicit provider-call accounting. ### Parallelism @@ -1014,14 +1013,17 @@ Why this exists: every sibling tool consuming Clarion should ask in *their* nati 404 behaviour: returns 200 with `resolution_confidence: "none"` and empty `entity_id` — distinguishes "Clarion doesn't know this" from "Clarion is down." -#### Authentication — ADR-014 registry-backend read API +#### Authentication — ADR-014 / ADR-034 / ADR-042 registry-backend read API ADR-014 supersedes ADR-012 for the Filigree `registry_backend: clarion` -HTTP read surface. The registry-backend API is unauthenticated and -loopback-only by default. It refuses non-loopback binds unless -`serve.http.allow_non_loopback: true`; that opt-in requires an -operator-managed authenticated reverse proxy or equivalent access-control -layer in front of Clarion. +HTTP read surface. The registry-backend API is loopback-only by default and may +run unauthenticated only in that local sidecar posture. ADR-034 closes the +non-loopback gap: a non-loopback bind requires both +`serve.http.allow_non_loopback: true` and a resolved authentication secret +(preferred HMAC identity via `serve.http.identity_token_env`, or legacy bearer +via `serve.http.token_env`). ADR-042 hardens the HMAC form with timestamp and +nonce freshness, while `docs/federation/contracts.md` remains the authoritative +wire surface. ADR-012's UDS/token design is retained as historical context for the earlier broad v0.1 HTTP API proposal, but it is not the implementation contract for the @@ -1029,9 +1031,9 @@ registry-backend file-resolution endpoint. Loopback is not a complete security boundary on modern dev hosts (shared containers, devcontainers, and other local processes all sit on 127.0.0.1). -The ADR-014 stance accepts that local-read exposure for the bounded +The current stance accepts that local-read exposure for the bounded registry-backend API and prevents accidental network exposure through the -non-loopback guard. +non-loopback guard plus mandatory authentication. **Default — loopback only**: - Binds only to loopback addresses. diff --git a/docs/clarion/adr/ADR-005-clarion-dir-tracking.md b/docs/clarion/adr/ADR-005-clarion-dir-tracking.md index 5750a2b5..fb813d21 100644 --- a/docs/clarion/adr/ADR-005-clarion-dir-tracking.md +++ b/docs/clarion/adr/ADR-005-clarion-dir-tracking.md @@ -1,6 +1,6 @@ # ADR-005: `.clarion/` Directory Git-Tracking Policy -**Status**: Accepted +**Status**: Accepted; amended by ADR-041 **Date**: 2026-04-18 **Deciders**: qacona@gmail.com **Context**: `clarion install` must write a `.gitignore` inside `.clarion/` that diff --git a/docs/clarion/adr/ADR-011-writer-actor-concurrency.md b/docs/clarion/adr/ADR-011-writer-actor-concurrency.md index 03932b1b..2d132935 100644 --- a/docs/clarion/adr/ADR-011-writer-actor-concurrency.md +++ b/docs/clarion/adr/ADR-011-writer-actor-concurrency.md @@ -1,6 +1,6 @@ # ADR-011: Writer-Actor Concurrency Model with Per-N-Files Transactions -**Status**: Accepted +**Status**: Accepted; amended by ADR-041 **Date**: 2026-04-18 **Deciders**: qacona@gmail.com **Context**: SQLite concurrency model for `clarion analyze` + `clarion serve` against a shared `.clarion/clarion.db`; design-review `§2.2` flagged the original single-transaction posture as CRITICAL diff --git a/docs/clarion/adr/ADR-034-federation-http-read-api-hardening.md b/docs/clarion/adr/ADR-034-federation-http-read-api-hardening.md index 47e7343a..ab6e39cf 100644 --- a/docs/clarion/adr/ADR-034-federation-http-read-api-hardening.md +++ b/docs/clarion/adr/ADR-034-federation-http-read-api-hardening.md @@ -1,6 +1,6 @@ # ADR-034: Federation HTTP Read API Hardening — Identity Auth, Batch Resolution, `BRIEFING_BLOCKED`, Instance ID -**Status**: Accepted +**Status**: Accepted; HMAC freshness amended by [ADR-042](./ADR-042-hmac-freshness-and-replay-window.md) **Date**: 2026-05-19 **Deciders**: qacona@gmail.com **Context**: Sprint 3 Loom federation hardening (see [`docs/implementation/sprint-3/2026-05-19-loom-federation-hardening-tasking.md`](../../implementation/sprint-3/2026-05-19-loom-federation-hardening-tasking.md)); extends ADR-014's read-API §"Security Posture" and §"Error Envelope" diff --git a/docs/clarion/adr/ADR-035-operational-tuning-discipline.md b/docs/clarion/adr/ADR-035-operational-tuning-discipline.md index 6db08f3b..f80013ba 100644 --- a/docs/clarion/adr/ADR-035-operational-tuning-discipline.md +++ b/docs/clarion/adr/ADR-035-operational-tuning-discipline.md @@ -28,7 +28,7 @@ The 2026-05-22 architecture analysis (`docs/arch-analysis-2026-05-22-1924/04-fin ### The roundtable's diagnosis -Five SME reports (`docs/arch-analysis-2026-05-22-1924/temp/answer-{solution-architect,systems-thinker,python-engineer,quality-engineer,security-engineer}.md`) converged on a single root cause. The systems thinker named it most directly: +Five SME reports (archived under `docs/archive/working-notes/arch-analysis-2026-05-22-1924/answer-{solution-architect,systems-thinker,python-engineer,quality-engineer,security-engineer}.md`) converged on a single root cause. The systems thinker named it most directly: > "The five questions look like five concerns. They are one. Each is a missing-feedback-loop symptom: a place where operational reality has no path back to the artifact that would change behavior. They wear 'parameter' clothing (Level 12) but live at Level 6 (information flows) and Level 5 (rules)." — `answer-systems-thinker.md` @@ -341,11 +341,11 @@ The four-axis declaration is the floor — additional discipline can be layered ### SME roundtable (2026-05-23) -- Solution architect: [`temp/answer-solution-architect.md`](../../arch-analysis-2026-05-22-1924/temp/answer-solution-architect.md) — WP6 home triangulation; the "ship 1.0 with limits hardcoded; land config surface in WP6 as one ADR-021-aligned change" frame; the per-file split-trigger table. -- Systems thinker: [`temp/answer-systems-thinker.md`](../../arch-analysis-2026-05-22-1924/temp/answer-systems-thinker.md) — the level-5 (rules) leverage argument; the drift-to-low-performance archetype; the `analyze.rs:74` and `breaker.rs:7` smoking-gun tells (line numbers as recorded at analysis time; current `analyze.rs` `#[allow(clippy::too_many_lines)]` site has shifted to line 65 with two additional sites at 650 and 1190; the rule applies to all three). -- Python engineer: [`temp/answer-python-engineer.md`](../../arch-analysis-2026-05-22-1924/temp/answer-python-engineer.md) — the wire-contract-pinned vs. operational-tunable Python constant classification; the `MAX_PYRIGHT_RESTARTS_PER_RUN` "per run" name vs. per-instance implementation interaction failure; the basis for the `Coupling` axis. -- Quality engineer: [`temp/answer-quality-engineer.md`](../../arch-analysis-2026-05-22-1924/temp/answer-quality-engineer.md) — the per-constant test-coverage matrix; the `DEFAULT_MAX_RSS_MIB`/`DEFAULT_MAX_NOFILE`/`DEFAULT_MAX_NPROC` security-enforcement cluster as the highest-risk untested area. -- Security engineer: [`temp/answer-security-engineer.md`](../../arch-analysis-2026-05-22-1924/temp/answer-security-engineer.md) — the STRIDE-D/STRIDE-E framing of recompile-to-tune as a security-posture stance; the `clarion-llm` extraction as STRIDE-T/STRIDE-I defense-in-depth; the security-uniformity argument for keeping some constants `Override = recompile`. +- Solution architect: [`answer-solution-architect.md`](../../archive/working-notes/arch-analysis-2026-05-22-1924/answer-solution-architect.md) — WP6 home triangulation; the "ship 1.0 with limits hardcoded; land config surface in WP6 as one ADR-021-aligned change" frame; the per-file split-trigger table. +- Systems thinker: [`answer-systems-thinker.md`](../../archive/working-notes/arch-analysis-2026-05-22-1924/answer-systems-thinker.md) — the level-5 (rules) leverage argument; the drift-to-low-performance archetype; the `analyze.rs:74` and `breaker.rs:7` smoking-gun tells (line numbers as recorded at analysis time; current `analyze.rs` `#[allow(clippy::too_many_lines)]` site has shifted to line 65 with two additional sites at 650 and 1190; the rule applies to all three). +- Python engineer: [`answer-python-engineer.md`](../../archive/working-notes/arch-analysis-2026-05-22-1924/answer-python-engineer.md) — the wire-contract-pinned vs. operational-tunable Python constant classification; the `MAX_PYRIGHT_RESTARTS_PER_RUN` "per run" name vs. per-instance implementation interaction failure; the basis for the `Coupling` axis. +- Quality engineer: [`answer-quality-engineer.md`](../../archive/working-notes/arch-analysis-2026-05-22-1924/answer-quality-engineer.md) — the per-constant test-coverage matrix; the `DEFAULT_MAX_RSS_MIB`/`DEFAULT_MAX_NOFILE`/`DEFAULT_MAX_NPROC` security-enforcement cluster as the highest-risk untested area. +- Security engineer: [`answer-security-engineer.md`](../../archive/working-notes/arch-analysis-2026-05-22-1924/answer-security-engineer.md) — the STRIDE-D/STRIDE-E framing of recompile-to-tune as a security-posture stance; the `clarion-llm` extraction as STRIDE-T/STRIDE-I defense-in-depth; the security-uniformity argument for keeping some constants `Override = recompile`. ### Source-of-truth code locations diff --git a/docs/clarion/adr/ADR-040-semantic-search-embeddings.md b/docs/clarion/adr/ADR-040-semantic-search-embeddings.md index d651894e..204e24a7 100644 --- a/docs/clarion/adr/ADR-040-semantic-search-embeddings.md +++ b/docs/clarion/adr/ADR-040-semantic-search-embeddings.md @@ -38,12 +38,13 @@ Ship `search_semantic` as an opt-in tool behind the `EmbeddingProvider` trait, w - The API-endpoint default requires an external embedding service + key when enabled; the local-model alternative (no network) is not yet shipped. - Two storage files (`clarion.db` + `embeddings.db`) to manage operationally; the sidecar is rebuildable, so loss is non-fatal. -## Status of delivery (2026-06-02) +## Status of delivery (2026-06-04) -Shipped and tested at acceptance: the `EmbeddingProvider` trait + `RecordingEmbeddingProvider` + `ApiEmbeddingProvider` (clarion-core), `semantic_search:` config (off by default), the `.clarion/embeddings.db` sidecar (`clarion-storage::embeddings`), the `search_semantic` MCP tool (honest-degrade + bounded cosine + content-hash freshness), `serve` provider construction (`build_embedding_provider` → `with_semantic_search`), the gitignore entry, and this ADR. The read + enable path is complete; the sidecar is populated directly in tests to prove the search path. +Shipped and tested at acceptance: the `EmbeddingProvider` trait + `RecordingEmbeddingProvider` + `ApiEmbeddingProvider` (clarion-core), `semantic_search:` config (off by default), the `.clarion/embeddings.db` sidecar (`clarion-storage::embeddings`), the `search_semantic` MCP tool (honest-degrade + bounded cosine + content-hash freshness), `serve` provider construction (`build_embedding_provider` → `with_semantic_search`), the gitignore entry, and this ADR. The read + enable path is complete. + +Delivery update: `clarion analyze` now runs an opt-in post-commit embedding population pass when `semantic_search.enabled` has a configured provider. It embeds content-hashed entities into `.clarion/embeddings.db`, skips fresh `(entity_id, content_hash, model_id)` rows, and enforces `semantic_search.session_token_ceiling`. ## Follow-up -- **Analyze-time embedding population** — the write path that fills the sidecar during `clarion analyze` so the tool has data on a real project. Carries a design decision the original plan did not anticipate: summaries are on-demand (ADR-030), so there is no summary text at analyze time; the recommended text to embed is `short_name + docstring`. Cost folds into the policy-engine budget. **Tracked: `clarion-610743d7bc`.** Until it lands, `search_semantic` works but returns an empty ranked set on a freshly-analyzed real project (the sidecar is unpopulated). - **Local-model `EmbeddingProvider`** (`candle`/`ort`) — the no-network alternative behind the same trait. - **ANN backend** (sqlite-vec / HNSW) — only if the exact scan misses NFR-PERF-02; logged with that trigger, never silent. diff --git a/docs/clarion/adr/ADR-041-resume-is-idempotent-reemit.md b/docs/clarion/adr/ADR-041-resume-is-idempotent-reemit.md new file mode 100644 index 00000000..52280b8a --- /dev/null +++ b/docs/clarion/adr/ADR-041-resume-is-idempotent-reemit.md @@ -0,0 +1,76 @@ +# ADR-041: Analyze Resume Is Idempotent Re-Emit, Not Checkpoint Recovery + +**Status**: Accepted +**Date**: 2026-06-04 +**Deciders**: qacona@gmail.com +**Context**: `clarion analyze --resume RUN_ID` shipped as a run-lifecycle and +finding-emission repair path, while older design text still promised +phase/file checkpoint recovery. + +## Summary + +Clarion's v1.x `--resume RUN_ID` reopens the existing `runs` row, re-walks the +analysis idempotently, and emits Filigree findings with `mark_unseen=false`. +It is not a durable phase/file checkpoint recovery mechanism. + +This ADR amends ADR-005 and ADR-011 where they describe `partial.json` or +restart-at-first-uncommitted-file behavior. The SQLite WAL and writer-actor +contract remains: an unclean shutdown must not corrupt `.clarion/clarion.db`, +and committed rows survive. What changes is the resume promise: after a crash, +the operator can safely re-run the same run id without Filigree seen/unseen +flapping, but Clarion does not guarantee it will skip already completed +phases/files from that interrupted run. + +## Decision + +`--resume RUN_ID` has three responsibilities: + +1. Reopen the existing run row through `WriterCmd::ResumeRun`. +2. Re-emit deterministic graph/finding writes under the same run id. +3. Preserve Filigree lifecycle semantics by posting with `mark_unseen=false`. + +Checkpoint recovery is deferred until a future ADR defines a durable +phase/file checkpoint schema, recovery protocol, and tests that prove provider +calls are not repeated after interruption. That future design must account for +plugin-session ordering, import-edge filtering across files, summary-cache +side effects, and post-commit enrich-only phases. + +## Rationale + +The shipped implementation already has a coherent resume contract for the +federated finding lifecycle. Reusing the same run id prevents resumed partial +runs from marking not-yet-revisited Filigree findings as unseen. Entity and +finding writes are upsert/idempotent enough for a safe re-walk. + +Durable checkpoint recovery is a different feature. It requires a first-class +checkpoint table or run-file, per-phase completion markers, file-level provider +call accounting, and careful ordering for edges that need whole-plugin context. +Adding those implicitly would turn a federation lifecycle option into a hidden +scheduler subsystem. + +## Consequences + +- Crash recovery remains database-safe: WAL and writer transactions preserve + committed data and prevent corruption. +- Resume may repeat structural extraction, plugin work, and provider calls + unless existing content-hash caches independently avoid work. +- Operators must not rely on `.clarion/runs//partial.json` or + `checkpoints.jsonl`; neither is part of the v1.x contract. +- Future checkpoint recovery can be added as a separate capability without + changing Filigree's `scan_run_id` semantics. + +## Verification + +- Existing resume tests must assert run-row reopening and idempotent re-emit + behavior. +- A crash-safety test may assert database integrity and safe same-run re-walk + after interruption. +- Tests must not assert that completed phases/files are skipped unless a + successor ADR introduces checkpoint recovery. + +## Amends + +- ADR-005: removes `partial.json` as a v1.x resume material. +- ADR-011: narrows `--resume` from checkpoint recovery to idempotent re-emit. +- Requirements REQ-ANALYZE-03 and NFR-RELIABILITY-01: clarifies the active + v1.x behavior. diff --git a/docs/clarion/adr/ADR-042-hmac-freshness-and-replay-window.md b/docs/clarion/adr/ADR-042-hmac-freshness-and-replay-window.md new file mode 100644 index 00000000..4337316a --- /dev/null +++ b/docs/clarion/adr/ADR-042-hmac-freshness-and-replay-window.md @@ -0,0 +1,64 @@ +# ADR-042: HMAC Freshness and Replay Window + +**Status**: Accepted +**Date**: 2026-06-04 +**Deciders**: qacona@gmail.com +**Context**: Comprehensive security audit M9 found ADR-034's HMAC identity +authenticated request bytes but did not bind freshness, so a captured signed +request could be replayed inside the same deployment. +**Amends**: [ADR-034](./ADR-034-federation-http-read-api-hardening.md) HMAC +identity message shape. + +## Summary + +Clarion's protected HTTP routes keep ADR-034's preferred Loom component HMAC +mode, but every signed request now carries `X-Loom-Timestamp` and +`X-Loom-Nonce`. The HMAC canonical message is: + +```text +METHOD +PATH_AND_QUERY +SHA256_HEX_OF_REQUEST_BODY +X_LOOM_TIMESTAMP +X_LOOM_NONCE +``` + +Clarion accepts timestamps inside a five-minute skew window and rejects reuse of +the same nonce inside the process-local replay cache for that window. Missing, +malformed, stale, replayed, or wrongly signed requests all return the existing +`401 UNAUTHENTICATED` envelope. + +## Decision + +- Replace the local HMAC implementation with the `hmac` crate over `Sha256`. +- Replace local byte-loop equality with `subtle::ConstantTimeEq`. +- Require `X-Loom-Timestamp` as Unix seconds and `X-Loom-Nonce` as a non-empty + opaque string up to 128 bytes whenever `identity_token_env` is active. +- Include timestamp and nonce in the canonical HMAC message after the body hash. +- Maintain an in-memory, process-local nonce cache with a five-minute freshness + window. A server restart clears the cache; the timestamp bound still limits + replay usefulness across restarts. +- Preserve the legacy bearer-token mode when `identity_token_env` is absent. + +## Consequences + +- A captured HMAC request is no longer replayable during a running server + process unless the attacker can also mint a fresh signature. +- Sibling clients must update their signing helper to add timestamp and nonce + headers. This is an intentional hardening change to the authenticated wire + shape; the authoritative contract is `docs/federation/contracts.md` + §Authentication. +- The five-minute window is a local-federation compromise: wide enough for modest + clock skew, narrow enough to bound captured-request utility. Wider skew or key + rotation needs a successor ADR rather than an environment knob. +- The cache is per process, not durable. Durable nonce storage would add write + pressure to a read path and is unnecessary for the local-first threat model. + +## Related Decisions + +- [ADR-034](./ADR-034-federation-http-read-api-hardening.md) — introduces the + preferred HMAC identity mode this ADR amends. +- [ADR-036](./ADR-036-wardline-taint-fact-store.md) — `/api/wardline/*` routes + inherit this HMAC gate, including the larger body-read limit. +- [ADR-037](./ADR-037-shared-error-vocabulary.md) — no new HTTP error code is + introduced; freshness failures reuse `UNAUTHENTICATED`. diff --git a/docs/clarion/adr/README.md b/docs/clarion/adr/README.md index 2d859675..c9e54d6e 100644 --- a/docs/clarion/adr/README.md +++ b/docs/clarion/adr/README.md @@ -10,10 +10,10 @@ This folder is the canonical home for authored Clarion architecture decision rec | [ADR-002](./ADR-002-plugin-transport-json-rpc.md) | Plugin transport: Content-Length framed JSON-RPC subprocess | Accepted | | [ADR-003](./ADR-003-entity-id-scheme.md) | Entity ID scheme: symbolic canonical names | Accepted | | [ADR-004](./ADR-004-finding-exchange-format.md) | Finding-exchange format: Filigree-native intake | Accepted | -| [ADR-005](./ADR-005-clarion-dir-tracking.md) | `.clarion/` git-committable by default; DB included, run logs excluded | Accepted | +| [ADR-005](./ADR-005-clarion-dir-tracking.md) | `.clarion/` git-committable by default; DB included, run logs excluded | Accepted; amended by ADR-041 | | [ADR-006](./ADR-006-clustering-algorithm.md) | Clustering algorithm — Leiden on imports+calls subgraph; fallback amended by ADR-032 | Accepted; amended | | [ADR-007](./ADR-007-summary-cache-key.md) | Summary cache key — 5-part composite with TTL backstop and churn-eager invalidation | Accepted | -| [ADR-011](./ADR-011-writer-actor-concurrency.md) | Writer-actor concurrency with per-N-files transactions; `--shadow-db` opt-in | Accepted | +| [ADR-011](./ADR-011-writer-actor-concurrency.md) | Writer-actor concurrency with per-N-files transactions; `--shadow-db` opt-in | Accepted; amended by ADR-041 | | [ADR-012](./ADR-012-http-auth-default.md) | HTTP read-API authentication — UDS default with token fallback | Superseded for ADR-014 registry-backend API | | [ADR-013](./ADR-013-pre-ingest-secret-scanner.md) | Pre-ingest secret scanner with LLM-dispatch block | Accepted | | [ADR-014](./ADR-014-filigree-registry-backend.md) | Filigree `registry_backend` flag and pluggable `RegistryProtocol` | Accepted; partially extended by ADR-034 | @@ -34,17 +34,19 @@ This folder is the canonical home for authored Clarion architecture decision rec | [ADR-031](./ADR-031-schema-validation-policy.md) | Schema-validation policy — CHECK on closed core-owned vocabularies (`findings.{kind,severity,status}`, `runs.status`); writer-actor + manifest are the only enforcement layer for plugin-extensible vocabularies (`entities.kind`, `edges.kind`) | Accepted | | [ADR-032](./ADR-032-weighted-components-clustering-fallback.md) | Weighted-components clustering fallback naming | Accepted | | [ADR-033](./ADR-033-v1.0-distribution.md) | v1.0 distribution via GitHub Releases (binary matrix + Python sdist; promote to crates.io/PyPI at v2.0) | Accepted | -| [ADR-034](./ADR-034-federation-http-read-api-hardening.md) | Federation HTTP read API hardening — bearer auth, batch resolution, `BRIEFING_BLOCKED`, instance ID | Accepted | +| [ADR-034](./ADR-034-federation-http-read-api-hardening.md) | Federation HTTP read API hardening — bearer auth, batch resolution, `BRIEFING_BLOCKED`, instance ID | Accepted; amended by ADR-042 | | [ADR-035](./ADR-035-operational-tuning-discipline.md) | Operational tuning discipline — declared basis / override surface / retune trigger / coupling per constant; file-LOC + crate-boundary budgets; CI lint gate | Accepted | | [ADR-036](./ADR-036-wardline-taint-fact-store.md) | Clarion as Wardline taint-fact store — `wardline_taint_facts` table + `/api/wardline/*` routes; first read+write HTTP surface (optional writer-actor, default off); passes loom.md §3–§5 (ADR, not asterisk) | Accepted | | [ADR-037](./ADR-037-shared-error-vocabulary.md) | Shared error vocabulary (`clarion-core::errors`) — two typed enums (`HttpErrorCode`, `McpErrorCode`) as single source of truth; wire spelling unchanged on both surfaces; relates to ADR-034 | Accepted | | [ADR-038](./ADR-038-sei-token-and-signature.md) | SEI token scheme (`clarion:eid:`), signature schema (plugin-declared versioned JSON), and identity persistence (`sei_bindings` table, not an `entities` column); reserves the `clarion:eid:` locator namespace; resolves SEI-standard REQ-C-01/REQ-C-02; demotes ADR-003 id to *locator* | Accepted | | [ADR-039](./ADR-039-llm-provider-pivot-openrouter-cli.md) | LLM provider pivot — OpenRouter (live HTTP) + Codex/Claude CLI bridges + recording provider; `CachingModel::OpenAiChatCompletions` (not Anthropic four-`cache_control`-breakpoint); supersedes CON-ANTHROPIC-01 | Accepted | | [ADR-040](./ADR-040-semantic-search-embeddings.md) | Semantic search (`search_semantic`) — opt-in `EmbeddingProvider` trait (recording + API-endpoint impls), git-ignored `.clarion/embeddings.db` sidecar keyed `(entity_id, content_hash, model_id)` (extends ADR-005's gitignore list), bounded exact cosine scan, policy-engine cost governance | Accepted | +| [ADR-041](./ADR-041-resume-is-idempotent-reemit.md) | Analyze resume is idempotent re-emit, not checkpoint recovery; amends ADR-005/ADR-011 resume language | Accepted | +| [ADR-042](./ADR-042-hmac-freshness-and-replay-window.md) | HMAC freshness and replay window — timestamp + nonce headers, crate-backed HMAC, process-local replay cache | Accepted | ## Backlog still tracked in the detailed design -The following decisions are still backlog items rather than authored ADR files. Their current summaries live in [../v0.1/detailed-design.md](../v0.1/detailed-design.md) §11 and [../v0.1/system-design.md](../v0.1/system-design.md) §12. +The following decisions are still backlog items rather than authored ADR files. Their current summaries live in [../1.0/detailed-design.md](../1.0/detailed-design.md) §11 and [../1.0/system-design.md](../1.0/system-design.md) §12. | ADR | Title | Current state | |---|---|---| diff --git a/docs/federation/contracts.md b/docs/federation/contracts.md index 9e4dc9d2..4a6b9da9 100644 --- a/docs/federation/contracts.md +++ b/docs/federation/contracts.md @@ -41,7 +41,8 @@ pool; its read paths still use the reader pool. ### Authentication The `/api/v1/files`-family endpoints require -`X-Loom-Component: clarion:` when Clarion has resolved +`X-Loom-Component: clarion:`, `X-Loom-Timestamp: `, and +`X-Loom-Nonce: ` when Clarion has resolved `serve.http.identity_token_env` at startup. The HMAC is lowercase hex HMAC-SHA256 over the canonical message: @@ -49,8 +50,17 @@ HMAC-SHA256 over the canonical message: + + ``` +Clarion accepts timestamps within a five-minute skew window and rejects nonce +reuse inside that same process-local window. Nonces are scoped to one Clarion +server process and one shared secret; clients should use high-entropy unique +nonces for every signed request. Replays, stale timestamps, missing freshness +headers, malformed freshness headers, and wrong signatures all return the same +`401 UNAUTHENTICATED` envelope. + `/api/v1/_capabilities` is **always** unauthenticated so siblings can probe the API surface pre-auth. Clarion still accepts the older `Authorization: Bearer ` path when `token_env` resolves and @@ -69,7 +79,7 @@ startup, before binding): | Non-loopback | unset | unset | **Refuse to start** with `CLA-CONFIG-HTTP-NO-AUTH`. | Authentication rejection (header absent, wrong scheme/prefix, wrong token or -signature, blank token or signature) returns: +signature, blank token or signature, stale timestamp, or reused nonce) returns: ```http HTTP/1.1 401 Unauthorized @@ -836,9 +846,10 @@ directory name returns `403` with `code: "PROJECT_MISMATCH"`. (Reference: ### Sub-router framing, auth, and limits The `/api/wardline/*` routes sit behind the **same identity middleware** as the -protected `/api/v1/*` routes (HMAC `X-Loom-Component: clarion:` preferred per -ADR-034, legacy `Authorization: Bearer` accepted as fallback, loopback-unauth -allowed; see [Authentication](#authentication)). The only difference is the body +protected `/api/v1/*` routes (HMAC `X-Loom-Component: clarion:` plus +timestamp/nonce freshness preferred per ADR-034/ADR-042, legacy +`Authorization: Bearer` accepted as fallback, loopback-unauth allowed; see +[Authentication](#authentication)). The only difference is the body limit used while reading the request to verify the HMAC signature: the wardline guard reads up to **4 MiB** (`WARDLINE_BODY_LIMIT_BYTES`) rather than the `/api/v1/*` 16 KiB, because batched resolves/writes carry thousands of qualnames. diff --git a/docs/implementation/README.md b/docs/implementation/README.md index 5963899e..044944ea 100644 --- a/docs/implementation/README.md +++ b/docs/implementation/README.md @@ -1,6 +1,6 @@ # Implementation Archive -This folder is the consolidated archive of Clarion's implementation and planning history. It is **not** part of the release-facing doc surface — readers entering via [`docs/README.md`](../README.md) and the [Clarion v0.1 docset](../clarion/v0.1/README.md) are not expected to need anything here. +This folder is the consolidated archive of Clarion's implementation and planning history. It is **not** part of the release-facing doc surface — readers entering via [`docs/README.md`](../README.md) and the [Clarion 1.0 docset](../clarion/1.0/README.md) are not expected to need anything here. Material is kept rather than deleted because the [ADRs](../clarion/adr/README.md) cite it for historical context (panel reviews, the v0.1 scope-commitment memo, sprint plans, and agent handoffs that motivated specific decisions). @@ -21,7 +21,7 @@ Material is kept rather than deleted because the [ADRs](../clarion/adr/README.md ## Relationship to release-facing docs -- **Authoritative design**: [`../clarion/v0.1/system-design.md`](../clarion/v0.1/system-design.md) and [`../clarion/v0.1/detailed-design.md`](../clarion/v0.1/detailed-design.md). Each work package under this folder names the sections it implements. +- **Authoritative design**: [`../clarion/1.0/system-design.md`](../clarion/1.0/system-design.md) and [`../clarion/1.0/detailed-design.md`](../clarion/1.0/detailed-design.md). Each work package under this folder names the sections it implements. - **Decisions**: [`../clarion/adr/README.md`](../clarion/adr/README.md). Each work package names the accepted ADRs it depends on and any backlog ADRs it is expected to surface. - **Scope and commitments**: [`v0.1-scope-plans/v0.1-scope-commitments.md`](./v0.1-scope-plans/v0.1-scope-commitments.md). That memo locks *what* v0.1 ships; the work-package plans describe *how* the build proceeds. diff --git a/docs/implementation/comprehensive-readonly-audit-2026-06-04.md b/docs/implementation/comprehensive-readonly-audit-2026-06-04.md new file mode 100644 index 00000000..c79f0fcf --- /dev/null +++ b/docs/implementation/comprehensive-readonly-audit-2026-06-04.md @@ -0,0 +1,794 @@ +# Clarion Comprehensive Read-Only Audit + +Date: 2026-06-04 +Repository: `/home/john/clarion` +Branch observed: `ws6-guidance-maturity` ahead of `origin/ws6-guidance-maturity` by 2 +Dirty state observed: untracked `.clarion/clarion.lock` + +## Method + +This was a static, source-and-document audit. Session startup checks were run with +`filigree session-context` and `git status --short --branch`. + +The requested subagent API did not expose literal `enable_write_tools=false` or +`enable_mcp_tools=false` fields. Each subagent prompt carried those constraints +explicitly, including strict no-edit/no-MCP instructions and the instruction not +to use escaped double quotes in tool arguments. + +Subagent execution: + +| Role | Status | +| --- | --- | +| Architecture Critic | Completed | +| Systems Thinker | Completed | +| Rust and Python Implementation Engineer | Completed | +| Quality Engineer | Completed | +| Static Tools Analyst | Completed | +| Security Architect | Blocked by external platform usage limit before report | +| MCP and CLI Specialist | Blocked by external platform usage limit before report | + +Because two reports were externally blocked, the coordinator performed a local +read-only security and MCP/CLI synthesis from the same source evidence. No build, +test, formatter, analyzer, or server command was run, because the audit was +constrained to read-only investigation. + +## Critical Findings + +None found in the completed read-only audit. + +## High Findings + +### H1. Combined plugin item cap excludes edges and findings + +Severity: High + +Locations: +- [host.rs](/home/john/clarion/crates/clarion-core/src/plugin/host.rs:978), lines 978-1087 +- [limits.rs](/home/john/clarion/crates/clarion-core/src/plugin/limits.rs:109), lines 109-130 +- [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:3478), lines 3478-3569 + +Evidence: `EntityCountCap` documents ADR-021 as a combined cap for entities, +edges, and findings. The host calls `try_admit(1)` only in the entity loop, then +`process_edges` explicitly says edges do not participate in the cap. The CLI +accumulates accepted edges before writing them. + +Impact: A buggy or hostile plugin can emit very large valid edge/finding sets +bounded mainly by frame size and memory, violating the documented resource +contract and risking memory growth or storage spam. + +Remediation: Rename the cap to an item cap or keep the current name with updated +semantics, then charge accepted entities, accepted edges, and retained findings +against the same run-level counter. Apply admission before accepting each batch. + +Acceptance test: Configure a tiny cap, emit one entity and enough valid edges or +findings to exceed it, and assert the host emits the cap finding and stops before +persisting the excess items. + +### H2. macOS plugin limit cfgs appear inconsistent + +Severity: High + +Locations: +- [host.rs](/home/john/clarion/crates/clarion-core/src/plugin/host.rs:48), lines 48-59 +- [host.rs](/home/john/clarion/crates/clarion-core/src/plugin/host.rs:594), lines 594-610 +- [limits.rs](/home/john/clarion/crates/clarion-core/src/plugin/limits.rs:301), lines 301-326 + +Evidence: The `pre_exec` block is compiled for Linux or macOS, but `host.rs` +imports several limit symbols only on Linux and `effective_max_nproc` is compiled +only for Linux/tests. The macOS path therefore appears to reference symbols that +are not imported or defined. + +Impact: Clarion can fail to compile on macOS targets despite macOS being named +in the resource-limit path and release governance history. + +Remediation: Align cfgs. Either make the `pre_exec` block Linux-only or import +and define macOS-safe helpers, splitting Linux-only `nproc` behavior from the +portable address-space/file-descriptor limits. + +Acceptance test: Run `cargo check --workspace --all-targets --target x86_64-apple-darwin` +or restore an equivalent macOS CI leg and prove `clarion-core` builds. + +### H3. File entities are not yet the canonical graph/source anchor + +Severity: High + +Locations: +- [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:3514), lines 3514-3540 +- [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:3853), lines 3853-3933 +- [writer.rs](/home/john/clarion/crates/clarion-storage/src/writer.rs:571), lines 571-573 + +Evidence: Analyze now creates `core:file:*` records, but plugin entities still +derive `source_file_id` from the module entity. Storage still permits both +`file` and `module` as source anchors. + +Impact: Consumers see split semantics: file entities exist, while navigation, +source anchoring, and identity still attach through module entities. + +Remediation: Make `core:file:*` the canonical `source_file_id`, parent module +entities under the file entity, emit file-to-module containment, populate file +metadata, and then tighten storage validation to file anchors only. + +Acceptance test: Analyze a one-file fixture and assert the file entity exists, +the module parent is the file, the function parent chain resolves to the file, +`source_file_id` is the file id, and required file metadata is present. + +### H4. Resume is a re-walk, not checkpoint recovery + +Severity: High + +Locations: +- [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:321), lines 321-328 +- [commands.rs](/home/john/clarion/crates/clarion-storage/src/commands.rs:157), lines 157-164 +- [writer.rs](/home/john/clarion/crates/clarion-storage/src/writer.rs:439), lines 439-447 + +Evidence: Storage comments state `--resume` is a re-emit-without-flip path, not +incremental checkpoint recovery. No durable phase/file checkpoint path was found. + +Impact: A killed or failed large run repeats completed work instead of resuming +from the last successful phase/file. + +Remediation: Add durable phase/file checkpoints and make `--resume` consult them +to skip completed work. If this narrower behavior is intentional, update the +requirements/design through an ADR instead of leaving the contract ambiguous. + +Acceptance test: Kill a run after early phases complete, resume it, and assert +completed phases/files and provider calls are not repeated. + +### H5. Analyze buffers whole plugin output before writer backpressure applies + +Severity: High + +Locations: +- [requirements.md](/home/john/clarion/docs/clarion/1.0/requirements.md:548), lines 548-554 +- [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:668), lines 668-700 +- [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:785), lines 785-884 +- [writer.rs](/home/john/clarion/crates/clarion-storage/src/writer.rs:974), lines 974-989 + +Evidence: Requirements describe streaming so the core can commit entities +incrementally. Implementation runs blocking plugin work, collects all entities, +edges, unresolved sites, and stats, then sends records to the writer afterward. + +Impact: Large repositories can consume excessive memory, and a late plugin +failure loses completed file output for that plugin despite the writer-actor +design. + +Remediation: Stream per-file or per-batch plugin results through a bounded +channel to the writer. Preserve per-file progress and apply backpressure during +extraction instead of after full collection. + +Acceptance test: Use a fake plugin that emits many files then fails; assert +earlier batches are durable, the run records failure, and memory stays bounded. + +### H6. Python plugin advertises Wardline awareness while semantic extraction is absent + +Severity: High + +Locations: +- [plugin.toml](/home/john/clarion/plugins/python/plugin.toml:22), lines 22-40 +- [wardline_probe.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/wardline_probe.py:38), lines 38-84 +- [server.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/server.py:144), lines 144-155 +- [extractor.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/extractor.py:179), lines 179-194 +- [extractor.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/extractor.py:903), lines 903-930 + +Evidence: The manifest sets `wardline_aware = true`, but the ontology exposes +only `function`, `class`, `module` and `contains`, `calls`, `references`, +`imports`. The probe reports package availability/version; decorator handling +extends entity spans but does not emit Wardline tags, groups, annotations, or +decorator edges. + +Impact: Downstream guidance/federation consumers can infer Wardline enrichment is +enabled when no usable semantic signal is emitted. + +Remediation: Implement Wardline/decorator extraction, or downgrade the manifest +capability claim until the signal is actually produced. + +Acceptance test: Analyze fixtures with direct, factory, stacked, and aliased +Wardline decorators and assert emitted annotation metadata and ordered +decorator edges; also assert explicit degraded behavior when the Wardline +vocabulary is unavailable. + +### H7. Stale analyze run rows can persist and leak into project status + +Severity: High + +Locations: +- [analyze_runs.rs](/home/john/clarion/crates/clarion-mcp/src/analyze_runs.rs:11), lines 11-16 +- [analyze_runs.rs](/home/john/clarion/crates/clarion-mcp/src/analyze_runs.rs:166), lines 166-202 +- [lib.rs](/home/john/clarion/crates/clarion-mcp/src/lib.rs:2005), lines 2005-2028 +- [writer.rs](/home/john/clarion/crates/clarion-storage/src/writer.rs:344), lines 344-363 + +Evidence: MCP cancel explicitly defers supervising-process crash reconciliation +to future `owner_pid`/`heartbeat_at` work. `project_status` reports the latest +raw `runs.status`, and writer cleanup marks failure only on normal writer +shutdown paths. + +Impact: Operators and agents can see `running` after the owner process is gone, +making index freshness and recovery decisions unreliable. + +Remediation: Add durable `runs.owner_pid` and `heartbeat_at`, reconcile stale +running rows on analyze startup/status/project-status reads, and mark abandoned +rows terminal with reason and completion time. + +Acceptance test: Seed a `running` row with a dead owner PID and stale heartbeat; +`project_status_get` and `analyze_status` should both report an abandoned/failed +terminal state and update the row. + +### H8. Guidance invalidates summaries, but summary generation ignores guidance + +Severity: High + +Locations: +- [ADR-007-summary-cache-key.md](/home/john/clarion/docs/clarion/adr/ADR-007-summary-cache-key.md:28), lines 28-40 +- [ADR-030-on-demand-summary-scope.md](/home/john/clarion/docs/clarion/adr/ADR-030-on-demand-summary-scope.md:58), lines 58-63 +- [summary.rs](/home/john/clarion/crates/clarion-mcp/src/tools/summary.rs:425), lines 425-454 +- [summary.rs](/home/john/clarion/crates/clarion-mcp/src/tools/summary.rs:510), lines 510-532 +- [guidance.rs](/home/john/clarion/crates/clarion-storage/src/guidance.rs:443), lines 443-452 +- [guidance.rs](/home/john/clarion/crates/clarion-cli/src/guidance.rs:272), lines 272-282 + +Evidence: ADRs require `guidance_fingerprint` because summaries are +guidance-conditioned. The summary read path hard-codes `guidance-empty`, and the +prompt builder receives only entity/source fields. Guidance writes eagerly +invalidate matching summaries. + +Impact: Summaries can appear fresh under the cache contract while being +generated without the institutional guidance that supposedly affects them. + +Remediation: Compose applicable guidance during summary input construction, +hash it into the cache key, include it in the prompt, and make guidance mutation +plus affected-summary invalidation atomic or persist pending invalidation. + +Acceptance test: Cache a summary, create/edit a matching guidance sheet, then +request the summary with a recording LLM provider. The request should be a cache +miss with a changed fingerprint and guidance content in the prompt. + +### H9. Runtime-scope-blind imports can fabricate circular-import and clustering facts + +Severity: High + +Locations: +- [extractor.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/extractor.py:454), lines 454-526 +- [shortcuts.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/shortcuts.rs:104), lines 104-132 +- [requirements.md](/home/john/clarion/docs/clarion/1.0/requirements.md:564), lines 564-569 + +Evidence: `_ImportEdgeCollector` visits the whole AST and emits every import as +a module-level resolved `imports` edge. It does not track `if TYPE_CHECKING`, +function-local imports, or a type-only/scope property. Circular-import SCCs use +all import edges without filtering. + +Impact: Type-only and function-local imports can be treated as runtime imports, +fabricating SCCs, inflating coupling, and misleading subsystem clustering. + +Remediation: Track import context in AST traversal. Suppress type-only/function +local imports from runtime algorithms, or emit `type_only` and `scope` +properties and filter them in circular-import/coupling/clustering queries. + +Acceptance test: A fixture where `b.py` imports `a.py` only under +`if TYPE_CHECKING:` must not produce a circular-import cycle. + +### H10. Dead-code reachability ignores ambiguous call candidates beyond `to_id` + +Severity: High + +Locations: +- [pyright_session.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/pyright_session.py:432), lines 432-448 +- [shortcuts.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/shortcuts.rs:324), lines 324-331 +- [shortcuts.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/shortcuts.rs:738), lines 738-757 + +Evidence: Ambiguous calls store only `candidate_ids[0]` in `to_id`; the full set +is stored in `properties.candidates`. Dead-code reachability selects only +`from_id, to_id`, so candidates beyond the first are invisible. + +Impact: A target that is known to be reachable as an ambiguous candidate can be +reported as dead, contradicting the conservative fail-toward-live policy. + +Remediation: Reuse existing ambiguous-candidate expansion in storage helpers, or +parse `properties.candidates` in dead-code adjacency for `calls` edges. + +Acceptance test: Seed an ambiguous call with candidates `maybe_a` and `maybe_b`; +both must be excluded from `entity_dead_list`. + +### H11. Duplicate-definition dedup is not shared by call/reference resolution + +Severity: High + +Locations: +- [extractor.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/extractor.py:356), lines 356-365 +- [extractor.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/extractor.py:861), lines 861-889 +- [pyright_session.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/pyright_session.py:970), lines 970-982 +- [pyright_session.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/pyright_session.py:1008), lines 1008-1038 + +Evidence: Entity extraction first-wins duplicate definitions and suppresses +dropped duplicate bodies. Pyright indexing separately collects all definitions +and builds `by_id` with a dict comprehension, allowing later duplicates to +overwrite earlier ones. + +Impact: Calls/references from a dropped duplicate body can be attributed to the +surviving entity id with source ranges outside the stored entity span. + +Remediation: Centralize duplicate-disposition logic and share it across entity, +call-site, and reference-site collection, or switch to a consistently documented +last-wins model with explicit duplicate confidence. + +Acceptance test: Two same-name functions where only the dropped duplicate calls +`callee()` must not produce a `calls` edge from the surviving entity to `callee`. + +### H12. Non-authoritative unresolved-call results can leave stale inferred-edge anchors + +Severity: High + +Locations: +- [pyright_session.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/pyright_session.py:247), lines 247-276 +- [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:4141), lines 4141-4164 +- [unresolved.rs](/home/john/clarion/crates/clarion-storage/src/unresolved.rs:20), lines 20-29 +- [query.rs](/home/john/clarion/crates/clarion-storage/src/query.rs:961), lines 961-975 +- [summary.rs](/home/john/clarion/crates/clarion-mcp/src/tools/summary.rs:188), lines 188-199 + +Evidence: On Pyright unavailable/timeout/crash, `resolve_calls` reports +unresolved totals but returns an empty unresolved-site list. The analyzer only +clears all callers when the listed sites are authoritative. Reads fetch +unresolved rows by caller id only, while inference keys cache entries by current +caller content hash. + +Impact: Old unresolved-site rows can survive a changed caller body and feed a +new inferred-edge prompt/cache key. + +Remediation: Filter unresolved-site reads by current `caller_content_hash`, and +clear or mark stale caller rows when call resolution is non-authoritative. MCP +inference should reject rows whose stored hash differs from the current entity +hash. + +Acceptance test: Analyze with one unresolved site, change the caller body, +simulate Pyright unavailable, then request inferred dispatch. It must not prompt +on or materialize the stale site. + +### H13. Release verify no longer mirrors CI static guards + +Severity: High + +Locations: +- [ci.yml](/home/john/clarion/.github/workflows/ci.yml:48), lines 48-74 +- [release.yml](/home/john/clarion/.github/workflows/release.yml:26), lines 26-29 +- [release.yml](/home/john/clarion/.github/workflows/release.yml:62), lines 62-117 + +Evidence: `release.yml` says the verify job must mirror CI. CI includes release +governance static guard, pyright pin lockstep, Wardline version bounds, and +entity-cap lockstep checks that are absent from the release verify job. + +Impact: A tag/manual release path can build artifacts from a commit that would +fail CI-only release-safety checks. + +Remediation: Add the missing guard steps to `release.yml` or centralize verify +logic into a shared script/reusable workflow used by both CI and release. + +Acceptance test: Intentionally break the pyright pin or entity-cap ADR/code +lockstep; both CI and release verify should fail before artifact build/publish. + +## Medium Findings + +### M1. Python call resolution mixes AST byte offsets with LSP UTF-16 positions + +Severity: Medium + +Locations: +- [pyright_session.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/pyright_session.py:370), lines 370-463 +- [pyright_session.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/pyright_session.py:1202), lines 1202-1218 +- [pyright_session.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/pyright_session.py:1384), lines 1384-1389 + +Evidence: AST `col_offset`/`end_col_offset` are byte offsets. LSP ranges are +UTF-16 character offsets. The code compares and converts these as if they were +the same coordinate system. + +Impact: Non-ASCII text before a call on the same line can produce unresolved or +incorrectly anchored call edges. + +Remediation: Normalize call-site matching to a single coordinate system, +preferably LSP UTF-16 positions for Pyright matching plus a UTF-16-aware +position-to-byte converter. + +Acceptance test: Add a plugin test with non-ASCII text before a call; assert the +call resolves and the byte span slices exactly to the callee expression. + +### M2. Closure-local references can become false module references + +Severity: Medium + +Locations: +- [extractor.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/extractor.py:632), lines 632-668 +- [pyright_session.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/pyright_session.py:602), lines 602-624 + +Evidence: The reference collector suppresses only names bound in the current +scope. Names bound in an enclosing function can be collected in an inner +function; if Pyright resolves them to a local assignment instead of an indexed +entity, target fallback can become the module entity. + +Impact: Common closure patterns can create misleading `references` edges to a +module, polluting neighborhoods and dependency interpretation. + +Remediation: Track enclosing-scope bindings and suppress non-entity local +references, or map them to the containing function if that is the intended graph +model. Reserve module fallback for true module-level definitions/imports. + +Acceptance test: For `outer -> inner -> return token`, where `token` is an outer +local, assert no `references` edge targets the module for `token`. + +### M3. Finding-list cap is silent after 5,000 rows and applied before filters + +Severity: Medium + +Locations: +- [lib.rs](/home/john/clarion/crates/clarion-mcp/src/lib.rs:345), lines 345-347 +- [inspection.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/inspection.rs:30), lines 30-31 +- [inspection.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/inspection.rs:179), lines 179-235 + +Evidence: `entity_finding_list` fetches `LIMIT 5000`, then checks whether the +already-capped vector reached 5,000. The 5,001st row is never fetched, so +`scan_truncated` cannot become true. Filters run after this cap. + +Impact: Older critical findings beyond the newest 5,000 can be missed while the +tool reports an apparently complete filtered result. + +Remediation: Push filters and pagination into SQL with exact count/has-more, or +fetch `FINDINGS_SCAN_CAP + 1` before filtering and report truncation honestly. + +Acceptance test: Seed 5,001 findings for an entity, including an older critical +finding beyond the cap; a severity filter should either return it or report +truncation. + +### M4. Analyze heartbeat can falsely look wedged during one long file + +Severity: Medium + +Locations: +- [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:3484), lines 3484-3494 +- [lib.rs](/home/john/clarion/crates/clarion-mcp/src/lib.rs:2796), lines 2796-2798 +- [lib.rs](/home/john/clarion/crates/clarion-mcp/src/lib.rs:2846), lines 2846-2850 +- [analyze.rs](/home/john/clarion/crates/clarion-mcp/src/tools/analyze.rs:136), lines 136-160 + +Evidence: Progress heartbeat is written at phase/file boundaries, but not while +`host.analyze_file(file)` is in flight. MCP treats heartbeat age over 30 seconds +as unobserved while file timeout can be longer. + +Impact: A healthy slow file can look like a wedged plugin, prompting premature +operator cancellation. + +Remediation: Add periodic heartbeat while a file is in flight, or make +staleness relative to configured file timeout and expose a clearer +`working_on_file` state. + +Acceptance test: A fixture plugin sleeps longer than 30 seconds but less than +the file timeout; `analyze_status` must not report stale/wedged during valid +work. + +### M5. Normative federation batch fixture is not exercised + +Severity: Medium + +Locations: +- [contracts.md](/home/john/clarion/docs/federation/contracts.md:236), lines 236-310 +- [serve.rs](/home/john/clarion/crates/clarion-cli/tests/serve.rs:195), lines 195-234 + +Evidence: The contract marks `fixtures/post-api-v1-files-batch.json` as +normative. The conformance test loads other fixtures but not that batch fixture. + +Impact: Sibling tools can rely on a documented wire contract that drifts from +implementation without tests noticing. + +Remediation: Add `post-api-v1-files-batch.json` to fixture-driven conformance. + +Acceptance test: Change only the batch fixture’s expected response shape; the +fixture-conformance test should fail. + +### M6. Wardline qualname fixture is copied into tests instead of executed + +Severity: Medium + +Locations: +- [contracts.md](/home/john/clarion/docs/federation/contracts.md:791), lines 791-805 +- [wardline_taint.rs](/home/john/clarion/crates/clarion-storage/src/wardline_taint.rs:346), lines 346-377 + +Evidence: The test states expected values were copied from +`wardline-qualname-normalization.json`, then hard-codes them. New fixture +vectors would not automatically run. + +Impact: Normative fixture drift can be missed. + +Remediation: Parse the fixture JSON directly in storage/MCP reconciliation +tests. + +Acceptance test: Add a new trap vector to the fixture; tests should fail until +implementation supports it. + +### M7. Entity git provenance columns are never populated + +Severity: Medium + +Locations: +- [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:3908), lines 3908-3933 +- [writer.rs](/home/john/clarion/crates/clarion-storage/src/writer.rs:487), lines 487-528 + +Evidence: Core file records set `first_seen_commit` and `last_seen_commit` to +`None`, and the writer preserves incoming values rather than repairing them. + +Impact: Catalog history/churn questions cannot be answered even though columns +exist. + +Remediation: Thread the analyzed commit into entity construction and writer +update semantics for first/last seen values. + +Acceptance test: Run over two commits and assert new, unchanged, and changed +entities have correct first/last commit values. + +### M8. Recoverable source-walk failures are log-only + +Severity: Medium + +Location: +- [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:4270), lines 4270-4321 + +Evidence: `collect_source_files` warns and increments a local skipped counter +for walk errors, but the return value is only a file list. The skipped count and +paths do not become durable stats or findings. + +Impact: Analysis can be incomplete while durable outputs look clean. + +Remediation: Return source-walk error counts/details into run stats and persist +a finding anchored to the project or path. + +Acceptance test: Analyze a fixture with an unreadable/skipped path and assert +durable stats/findings report the skipped source-walk failure. + +### M9. Federation HMAC is hand-rolled and has no freshness component + +Severity: Medium + +Locations: +- [auth.rs](/home/john/clarion/crates/clarion-cli/src/http_read/auth.rs:71), lines 71-110 +- [auth.rs](/home/john/clarion/crates/clarion-cli/src/http_read/auth.rs:122), lines 122-185 + +Evidence: HMAC-SHA256 and constant-time comparison are implemented locally. The +canonical message signs method, path/query, and body hash only; it has no +timestamp, nonce, replay cache, or expiry window. + +Impact: Local crypto primitives increase review burden, and a captured signed +request remains valid for the lifetime of the shared secret. + +Remediation: Replace local primitives with `hmac` and `subtle`, normalize +signature decoding before comparison, add timestamp/nonce fields, enforce a +bounded skew window, and store recent nonces per component identity. + +Acceptance test: Valid signatures, same-length wrong signatures, wrong-length +signatures, malformed hex, and missing headers all return the same envelope +class. Replaying the same signed request/nonce should fail. + +### M10. Decorator, inheritance, globals, protocol, and package ontology remains absent + +Severity: Medium + +Locations: +- [plugin.toml](/home/john/clarion/plugins/python/plugin.toml:31), lines 31-47 +- [requirements.md](/home/john/clarion/docs/clarion/1.0/requirements.md:556), lines 556-577 + +Evidence: Requirements name protocols, globals, modules, packages, and edges +such as `inherits_from`, `decorated_by`, `uses_type`, and `alias_of`. The live +plugin ontology declares only `function`, `class`, `module`, and +`contains`/`calls`/`references`/`imports`. + +Impact: Python framework and Wardline semantics encoded in decorators, bases, +types, and package exports are absent or approximated. + +Remediation: Implement the missing ontology or amend v1.0 requirements to make +the limitation explicit and honest to consumers. + +Acceptance test: Fixtures for `@app.route`, stacked decorators, class +decorators, `class Child(Base)`, module globals, and package re-exports emit the +documented shapes or return explicit missing-signal notes. + +## Low Findings + +### L1. Plugin handshake-failure test does not prove zombie reaping + +Severity: Low + +Locations: +- [host.rs](/home/john/clarion/crates/clarion-core/src/plugin/host.rs:656), lines 656-666 +- [host_subprocess.rs](/home/john/clarion/crates/clarion-core/tests/host_subprocess.rs:185), lines 185-233 + +Evidence: Production code kills/waits on handshake failure, but the test comment +states it verifies only prompt error return and non-hanging behavior, not zombie +reaping. + +Remediation: Add a Unix-only test seam or controlled `/proc` assertion for the +reap behavior. + +Acceptance test: Removing `child.wait()` from the handshake-failure path should +fail the new test. + +### L2. Python protocol reader lets malformed non-ASCII headers escape `ProtocolError` + +Severity: Low + +Locations: +- [server.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/server.py:74), lines 74-117 +- [server.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/server.py:294), lines 294-300 + +Evidence: `read_frame` decodes headers with `line.decode("ascii")`; a +non-ASCII byte raises `UnicodeDecodeError`, while `main` catches only +`ProtocolError`. + +Remediation: Catch `UnicodeDecodeError` during header decoding and convert it to +`ProtocolError`. + +Acceptance test: Feed a malformed non-ASCII header to the server entrypoint and +assert it returns the protocol-error exit path without corrupting stdout. + +### L3. Guidance create can race and overwrite a concurrent sheet + +Severity: Low + +Locations: +- [guidance.rs](/home/john/clarion/crates/clarion-cli/src/guidance.rs:228), lines 228-234 +- [guidance.rs](/home/john/clarion/crates/clarion-storage/src/guidance.rs:156), lines 156-204 + +Evidence: CLI create performs a non-atomic existence check before a low-level +upsert. The source comment acknowledges a concurrent create can overwrite the +earlier sheet. + +Remediation: Add an insert-only create primitive and reserve upsert for edit or +import paths. + +Acceptance test: Race two creates with the same computed id; one succeeds and +the other fails without modifying the first row. + +### L4. Capped graph scans use unordered `LIMIT` + +Severity: Low + +Locations: +- [shortcuts.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/shortcuts.rs:104), lines 104-119 +- [shortcuts.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/shortcuts.rs:741), lines 741-744 + +Evidence: Circular-import and dead-code adjacency scans use `LIMIT ?1` without +`ORDER BY`. + +Remediation: Add deterministic ordering such as +`ORDER BY from_id, to_id, kind, source_byte_start, source_byte_end`. For +dead-code, consider returning unavailable/degraded when edge scan truncates. + +Acceptance test: Seed more edges than the cap in randomized insertion order and +assert repeated runs return identical truncated output. + +### L5. MCP crate owns non-MCP helper surfaces + +Severity: Low + +Locations: +- [lib.rs](/home/john/clarion/crates/clarion-mcp/src/lib.rs:149), lines 149-490 +- [lib.rs](/home/john/clarion/crates/clarion-mcp/src/lib.rs:648), lines 648-970 +- [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:38), lines 38-44 + +Evidence: `clarion-mcp` carries tool registry, server state, dispatch, resources, +diagnostics, analyze registry, and utility clients; CLI analyze imports helpers +from `clarion_mcp`. + +Remediation: Move shared federation/config/scan-result helpers to a narrower +crate or CLI-owned module and split MCP registry/state/dispatch/resource code +into focused modules. + +Acceptance test: CLI analyze no longer depends on `clarion_mcp` for non-MCP +helpers, and MCP dispatch delegates to focused modules with behavior preserved. + +## Recommended Remediation Order + +1. Fix resource-limit correctness first: H1, H2, and H13. These are release and + platform safety issues with clear acceptance tests. +2. Fix graph correctness issues that can actively mislead agents: H9, H10, H11, + H12, M1, and M2. +3. Reconcile the design/implementation contracts: H3, H4, H5, H6, M10. +4. Repair stale-state and cache semantics: H7, H8, M3, M4, L3. +5. Harden security and conformance coverage: M5, M6, M8, M9, L1, L2, L4. +6. Treat L5 as a refactoring follow-up after the behavioral issues are tracked. + +## Verification Not Run + +No dynamic gates were run due to the strict read-only audit constraint: + +- `cargo fmt --all -- --check` +- `cargo clippy --workspace --all-targets --all-features -- -D warnings` +- `cargo build --workspace --bins` +- `cargo nextest run --workspace --all-features` +- `RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps --all-features` +- `cargo deny check` +- Python ruff/mypy/pytest gates +- E2E scripts +- macOS target checks + +## Residual Risk + +This audit is broad but static. Some findings are contract-confirmed by source +and docs; others, especially macOS compile behavior and dynamic Pyright/LSP edge +cases, should be reproduced with targeted tests before implementation planning. +Tracker deduplication was not performed, so several findings may correspond to +already-open Filigree work. + +## Remediation Addendum + +Date: 2026-06-04 + +This addendum was added after the read-only audit moved into implementation. +The original findings above remain unchanged as the audit trail. The current +worktree contains remediations for the listed findings, with targeted +regressions and broad gates run afterward. + +### Remediated Findings + +| Finding | Status | Primary implementation points | Regression evidence | +| --- | --- | --- | --- | +| H1 | Resolved | Combined entity, edge, and finding admission is enforced in [host.rs](/home/john/clarion/crates/clarion-core/src/plugin/host.rs:982) and [host.rs](/home/john/clarion/crates/clarion-core/src/plugin/host.rs:1174); cap semantics remain documented in [limits.rs](/home/john/clarion/crates/clarion-core/src/plugin/limits.rs:128). | Host cap tests in [host.rs](/home/john/clarion/crates/clarion-core/src/plugin/host.rs:2440) and [host.rs](/home/john/clarion/crates/clarion-core/src/plugin/host.rs:2524). | +| H2 | Resolved | macOS/Linux resource-limit cfgs now align in [host.rs](/home/john/clarion/crates/clarion-core/src/plugin/host.rs:52), [host.rs](/home/john/clarion/crates/clarion-core/src/plugin/host.rs:107), [limits.rs](/home/john/clarion/crates/clarion-core/src/plugin/limits.rs:301), and [limits.rs](/home/john/clarion/crates/clarion-core/src/plugin/limits.rs:320). | Workspace check/build/clippy gates below compile the local targets; macOS target CI remains the stronger remote proof. | +| H3 | Resolved | Analyze maps plugin output to canonical `core:file:*` anchors in [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:4137) and [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:4395); storage rejects module/function anchors in [writer.rs](/home/john/clarion/crates/clarion-storage/src/writer.rs:590). | Anchor tests in [writer_actor.rs](/home/john/clarion/crates/clarion-storage/tests/writer_actor.rs:1324) and [writer_actor.rs](/home/john/clarion/crates/clarion-storage/tests/writer_actor.rs:1374). | +| H4 | Resolved by contract reconciliation | ADR-041 makes v1.x resume an idempotent re-emit rather than checkpoint recovery in [ADR-041-resume-is-idempotent-reemit.md](/home/john/clarion/docs/clarion/adr/ADR-041-resume-is-idempotent-reemit.md:12), and amends ADR-005/ADR-011 status lines. | Resume behavior test in [analyze.rs](/home/john/clarion/crates/clarion-cli/tests/analyze.rs:2261). | +| H5 | Resolved | Plugin file output streams through a bounded channel in [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:765), [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:776), and [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:3673); cross-file edges are queued until both endpoints are inserted in [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:797) and [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:3593). | Failure-mode coverage in [analyze_failure_modes.rs](/home/john/clarion/crates/clarion-cli/tests/analyze_failure_modes.rs:439), plus full workspace and Phase 3 e2e tests. | +| H6 | Resolved by honest capability contract | Python Wardline capability claims and v1 ontology docs were reconciled in [plugin.toml](/home/john/clarion/plugins/python/plugin.toml:1), [requirements.md](/home/john/clarion/docs/clarion/1.0/requirements.md:1), [system-design.md](/home/john/clarion/docs/clarion/1.0/system-design.md:1), and [detailed-design.md](/home/john/clarion/docs/clarion/1.0/detailed-design.md:1). | Ontology test in [test_package.py](/home/john/clarion/plugins/python/tests/test_package.py:1). | +| H7 | Resolved | Runs now persist `owner_pid` and `heartbeat_at`; stale running rows are repaired by [runs.rs](/home/john/clarion/crates/clarion-storage/src/runs.rs:15), analyze startup in [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:371), MCP status in [status.rs](/home/john/clarion/crates/clarion-mcp/src/tools/status.rs:184), and analyze status in [analyze.rs](/home/john/clarion/crates/clarion-mcp/src/tools/analyze.rs:261). | Tests in [storage_tools.rs](/home/john/clarion/crates/clarion-mcp/tests/storage_tools.rs:4292) and [analyze_lifecycle.rs](/home/john/clarion/crates/clarion-mcp/tests/analyze_lifecycle.rs:265). | +| H8 | Resolved | Summary inputs compose applicable guidance and hash it into the cache key in [summary.rs](/home/john/clarion/crates/clarion-mcp/src/tools/summary.rs:39), [summary.rs](/home/john/clarion/crates/clarion-mcp/src/tools/summary.rs:77), and [summary.rs](/home/john/clarion/crates/clarion-mcp/src/tools/summary.rs:500). | Prompt/cache regression in [storage_tools.rs](/home/john/clarion/crates/clarion-mcp/tests/storage_tools.rs:1421). | +| H9 | Resolved | Python import extraction now records `type_only` and `scope`, and MCP runtime import algorithms filter them in [extractor.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/extractor.py:456), [extractor.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/extractor.py:550), and [shortcuts.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/shortcuts.rs:60). | Python package tests plus MCP shortcut tests. | +| H10 | Resolved | Dead-code reachability expands ambiguous call candidates from edge properties in [shortcuts.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/shortcuts.rs:823) and [shortcuts.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/shortcuts.rs:829). | Ambiguous candidate tests in [catalogue_tools.rs](/home/john/clarion/crates/clarion-mcp/tests/catalogue_tools.rs:1205). | +| H11 | Resolved | Duplicate-definition disposition now suppresses dropped duplicate bodies before call/reference collection in [extractor.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/extractor.py:896). | Regression in [test_extractor.py](/home/john/clarion/plugins/python/tests/test_extractor.py:565). | +| H12 | Resolved | Unresolved-call reads require current caller content hashes in [query.rs](/home/john/clarion/crates/clarion-storage/src/query.rs:961) and [query.rs](/home/john/clarion/crates/clarion-storage/src/query.rs:984); analyze writes hash-scoped replacements in [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:4419). | Tests in [query_helpers.rs](/home/john/clarion/crates/clarion-storage/tests/query_helpers.rs:438) and [storage_tools.rs](/home/john/clarion/crates/clarion-mcp/tests/storage_tools.rs:3132). | +| H13 | Resolved | Release verify now mirrors CI static guards in [release.yml](/home/john/clarion/.github/workflows/release.yml:70), [release.yml](/home/john/clarion/.github/workflows/release.yml:78), [release.yml](/home/john/clarion/.github/workflows/release.yml:85), and [release.yml](/home/john/clarion/.github/workflows/release.yml:88). | Guard scripts are covered by self-tests in the release verify job and cargo deny/build gates below. | +| M1 | Resolved | Pyright/LSP matching now converts AST byte columns and LSP UTF-16 positions explicitly in [pyright_session.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/pyright_session.py:1451), [pyright_session.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/pyright_session.py:1461), and [pyright_session.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/pyright_session.py:1471). | Regression in [test_pyright_session.py](/home/john/clarion/plugins/python/tests/test_pyright_session.py:180). | +| M2 | Resolved | Reference collection tracks enclosing local bindings in [extractor.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/extractor.py:650) and [extractor.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/extractor.py:717). | Python extractor tests. | +| M3 | Resolved | Finding filters and pagination are pushed into SQL before the cap in [inspection.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/inspection.rs:200) and [inspection.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/inspection.rs:216). | Regression in [catalogue_tools.rs](/home/john/clarion/crates/clarion-mcp/tests/catalogue_tools.rs:315). | +| M4 | Resolved | Analyze progress refreshes `heartbeat_at` through live progress snapshots and writer heartbeats in [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:126), [writer.rs](/home/john/clarion/crates/clarion-storage/src/writer.rs:1013), and [analyze.rs](/home/john/clarion/crates/clarion-mcp/src/tools/analyze.rs:147). | Analyze status and stale-run tests listed under H7. | +| M5 | Resolved | The normative batch fixture is exercised by [serve.rs](/home/john/clarion/crates/clarion-cli/tests/serve.rs:1). | `cargo test -p clarion-cli --test serve serve_http_responses_match_federation_fixture_contracts -- --nocapture`. | +| M6 | Resolved | Wardline qualname fixture vectors are loaded directly by storage/Python tests in [wardline_taint.rs](/home/john/clarion/crates/clarion-storage/src/wardline_taint.rs:1). | Full storage and Python gates. | +| M7 | Resolved | Analyze stamps entity git provenance in [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:3540), with writer preserving first-seen and refreshing last-seen in [writer.rs](/home/john/clarion/crates/clarion-storage/src/writer.rs:501). | Regression in [analyze.rs](/home/john/clarion/crates/clarion-cli/tests/analyze.rs:3100). | +| M8 | Resolved | Source-walk failures now persist stats/findings in [analyze.rs](/home/john/clarion/crates/clarion-cli/src/analyze.rs:4270). | `cargo test -p clarion-cli analyze::tests::source_walk -- --nocapture`. | +| M9 | Resolved | Federation auth uses `hmac`/`subtle` plus timestamp and nonce replay checks in [auth.rs](/home/john/clarion/crates/clarion-cli/src/http_read/auth.rs:1), with ADR-042 documenting the contract. | `cargo test -p clarion-cli http_read::auth::tests::hmac -- --nocapture` and `cargo test -p clarion-cli --test serve hmac_identity -- --nocapture`. | +| M10 | Resolved by contract reconciliation | v1.0 Python ontology now explicitly limits emitted kinds/edges and defers the absent ontology in [requirements.md](/home/john/clarion/docs/clarion/1.0/requirements.md:1), [system-design.md](/home/john/clarion/docs/clarion/1.0/system-design.md:1), and [detailed-design.md](/home/john/clarion/docs/clarion/1.0/detailed-design.md:1). | [test_package.py](/home/john/clarion/plugins/python/tests/test_package.py:1). | +| L1 | Resolved | Handshake-failure zombie reaping is asserted on Linux in [host_subprocess.rs](/home/john/clarion/crates/clarion-core/tests/host_subprocess.rs:1). | `cargo test -p clarion-core --test host_subprocess t9 -- --nocapture`. | +| L2 | Resolved | Python protocol header decoding converts non-ASCII malformed headers to `ProtocolError` in [server.py](/home/john/clarion/plugins/python/src/clarion_plugin_python/server.py:1). | [test_server.py](/home/john/clarion/plugins/python/tests/test_server.py:1). | +| L3 | Resolved | Guidance create is now insert-only and atomic in [guidance.rs](/home/john/clarion/crates/clarion-storage/src/guidance.rs:1) and [guidance.rs](/home/john/clarion/crates/clarion-cli/src/guidance.rs:1). | [guidance_write.rs](/home/john/clarion/crates/clarion-storage/tests/guidance_write.rs:1). | +| L4 | Resolved | Capped graph scans use deterministic ordering before `LIMIT` in [shortcuts.rs](/home/john/clarion/crates/clarion-mcp/src/catalogue/shortcuts.rs:1). | `cargo test -p clarion-mcp scan_truncates -- --nocapture`. | +| L5 | Resolved | Shared federation/config/scan-result helpers moved to [crates/clarion-federation](/home/john/clarion/crates/clarion-federation/src/lib.rs:1), with MCP retaining re-export shims and CLI importing the narrower crate. | `cargo check -p clarion-federation --all-targets`, `cargo check -p clarion-cli --all-targets`, and no remaining CLI references to `clarion_mcp::config`, `clarion_mcp::filigree`, `clarion_mcp::filigree_url`, or `clarion_mcp::scan_results`. | + +### Verification Run + +Focused regression checks: + +- `cargo test -p clarion-core --test host_subprocess t9 -- --nocapture` +- `cargo test -p clarion-storage --test guidance_write insert_guidance_sheet_rejects_existing_id_without_overwrite -- --nocapture` +- `cargo test -p clarion-mcp scan_truncates -- --nocapture` +- `cargo test -p clarion-federation -- --nocapture` +- `cargo test -p clarion-cli http_read::auth::tests::hmac -- --nocapture` +- `cargo test -p clarion-cli http_read::tests -- --nocapture` +- `cargo test -p clarion-cli http_read::wardline::tests -- --nocapture` +- `cargo test -p clarion-cli http_read::linkages::tests -- --nocapture` +- `cargo test -p clarion-cli --test serve hmac_identity -- --nocapture` +- `cargo test -p clarion-cli --test serve serve_http_responses_match_federation_fixture_contracts -- --nocapture` +- `cargo test -p clarion-cli --test analyze analyze_stamps_entities_with_git_head_commit -- --nocapture` +- `cargo test -p clarion-cli analyze::tests::source_walk -- --nocapture` +- `cargo test -p clarion-cli --test analyze analyze_migrates_a_stale_db_instead_of_failing -- --nocapture` +- `cargo test -p clarion-cli --test install install_applies_each_migration_exactly_once -- --nocapture` +- `cargo test -p clarion-cli --test wp1_e2e wp1_walking_skeleton_end_to_end -- --nocapture` +- `cargo test -p clarion-cli --test analyze_failure_modes analyze_defers_cross_file_edges_until_target_entity_batch_arrives -- --nocapture` +- `plugins/python/.venv/bin/pytest plugins/python/tests/test_package.py -q` +- `plugins/python/.venv/bin/pytest plugins/python/tests/test_server.py::test_malformed_non_ascii_header_uses_protocol_error_exit_path -q` + +Broad gates: + +- `plugins/python/.venv/bin/ruff check plugins/python` +- `plugins/python/.venv/bin/ruff format --check plugins/python` +- `plugins/python/.venv/bin/mypy --strict plugins/python` +- `plugins/python/.venv/bin/pytest plugins/python` (160 passed, 85% coverage) +- `cargo fmt --all -- --check` +- `cargo check --workspace --all-targets` +- `cargo test --workspace --all-features` +- `cargo nextest run --workspace --all-features` (1073 passed, 2 skipped) +- `cargo clippy --workspace --all-targets --all-features -- -D warnings` +- `cargo build --workspace --bins` +- `RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps --all-features` +- `cargo deny check` (passed with duplicate-crate/license-allowance warnings) +- `bash tests/e2e/sprint_1_walking_skeleton.sh` +- `bash tests/e2e/sprint_2_mcp_surface.sh` +- `bash tests/e2e/phase3_subsystems.sh` diff --git a/docs/implementation/sprint-1/README.md b/docs/implementation/sprint-1/README.md index 3a0c2c9d..77972fa4 100644 --- a/docs/implementation/sprint-1/README.md +++ b/docs/implementation/sprint-1/README.md @@ -88,7 +88,7 @@ in its owning WP doc. | # | Lock-in | Owning WP | Canonical section | `↗` cross-product touch | |---|---|---|---|---| -| L1 | SQLite schema shape per [detailed-design §3](../../clarion/v0.1/detailed-design.md#3-storage-implementation) — tables `entities`, `entity_tags`, `edges`, `findings`, `summary_cache`, `runs`, `schema_migrations`; `entity_fts` FTS5 virtual table + triggers; generated columns + indexes; `guidance_sheets` view _(locked on 2026-04-18)_ | WP1 | [`wp1-scaffold.md#l1--sqlite-schema-shape`](./wp1-scaffold.md#l1--sqlite-schema-shape) | `↗` Filigree `registry_backend: clarion` (WP10) reads via entity-ID columns | +| L1 | SQLite schema shape per [detailed-design §3](../../clarion/1.0/detailed-design.md#3-storage-implementation) — tables `entities`, `entity_tags`, `edges`, `findings`, `summary_cache`, `runs`, `schema_migrations`; `entity_fts` FTS5 virtual table + triggers; generated columns + indexes; `guidance_sheets` view _(locked on 2026-04-18)_ | WP1 | [`wp1-scaffold.md#l1--sqlite-schema-shape`](./wp1-scaffold.md#l1--sqlite-schema-shape) | `↗` Filigree `registry_backend: clarion` (WP10) reads via entity-ID columns | | L2 | Entity-ID 3-segment format `{plugin_id}:{kind}:{canonical_qualified_name}` per ADR-003 + ADR-022 _(locked on 2026-04-18)_ | WP1 + WP3 | [`wp1-scaffold.md#l2--entity-id-canonical-name-format`](./wp1-scaffold.md#l2--entity-id-canonical-name-format) | `↗` Wardline qualname reconciliation (ADR-018) uses the third segment as its Clarion-side join key | | L3 | Writer-actor command protocol (`tokio::task` + bounded `mpsc` + per-N commit) per ADR-011 _(locked on 2026-04-18)_ | WP1 | [`wp1-scaffold.md#l3--writer-actor-command-protocol`](./wp1-scaffold.md#l3--writer-actor-command-protocol) | — | | L4 | JSON-RPC method set + Content-Length framing per ADR-002 _(locked on 2026-04-24)_ | WP2 | [`wp2-plugin-host.md#l4--json-rpc-method-set--content-length-framing`](./wp2-plugin-host.md#l4--json-rpc-method-set--content-length-framing) | — | @@ -173,8 +173,8 @@ and pointed at [`signoffs.md`](./signoffs.md) Tier A. ## 9. References - [Clarion v0.1 high-level implementation plan](../v0.1-plan.md) -- [Clarion v0.1 system design](../../clarion/v0.1/system-design.md) — §2 (core/plugin), §4 (storage) -- [Clarion v0.1 detailed design](../../clarion/v0.1/detailed-design.md) — §1 (plugin transport), §3 (storage impl) +- [Clarion system design](../../clarion/1.0/system-design.md) — §2 (core/plugin), §4 (storage) +- [Clarion detailed design](../../clarion/1.0/detailed-design.md) — §1 (plugin transport), §3 (storage impl) - [ADR-001 Rust for core](../../clarion/adr/ADR-001-rust-for-core.md) - [ADR-002 Plugin transport JSON-RPC](../../clarion/adr/ADR-002-plugin-transport-json-rpc.md) - [ADR-003 Entity ID scheme](../../clarion/adr/ADR-003-entity-id-scheme.md) diff --git a/docs/implementation/v1.0-tag-cut/README.md b/docs/implementation/v1.0-tag-cut/README.md index cb95c781..a31ce33b 100644 --- a/docs/implementation/v1.0-tag-cut/README.md +++ b/docs/implementation/v1.0-tag-cut/README.md @@ -1,17 +1,18 @@ -# Clarion v1.0.0 — Tag-Cut Readiness +# Clarion v1.0.0 — Tag-Cut Readiness Archive -**Status**: RC1 hardening — `v1.0.0` tag held pending closure of the gap register. +**Status**: Historical archive. `v1.0.0` was tagged on 2026-05-19, but the +release did not publish artifacts; `v1.0.1` became the first published build. -This directory holds the canonical pre-tag-cut artifacts for Clarion v1.0.0. +This directory preserves the canonical pre-tag-cut artifacts for Clarion v1.0.0. It supersedes `docs/implementation/v0.1-publish/` (which was renamed in intent -when the v0.1 → v1.0 rebrand landed but never moved on disk) as the current -program-of-work surface for the tag. +when the v0.1 → v1.0 rebrand landed but never moved on disk) for that historical +tag-cut program. ## Documents | File | Purpose | |------|---------| -| [`gap-register.md`](gap-register.md) | Single source of truth for every gap between the current RC1 commit and a defensible `v1.0.0` tag. 24 gaps in 7 categories with evidence, fix, and effort. | +| [`gap-register.md`](gap-register.md) | Historical source of truth for every gap between the RC1 commit and a defensible `v1.0.0` tag. 24 gaps in 7 categories with evidence, fix, and effort. | | [`execution-plan.md`](execution-plan.md) | Day-by-day sequenced execution plan, with parallel-execution markers, operator-vs-engineering split, and the exit criteria for each day. | | [`filigree-issue-bodies.md`](filigree-issue-bodies.md) | Pre-drafted bodies for the Filigree issues that track each gap. Reference for issue creation; the live issues are authoritative once created. | diff --git a/docs/operator/README.md b/docs/operator/README.md index 828c9fea..065c5e4e 100644 --- a/docs/operator/README.md +++ b/docs/operator/README.md @@ -13,7 +13,10 @@ Practical notes for configuring and running Clarion. `clarion analyze` concurrency against one `.clarion/clarion.db`. - [Secret scanning](./secret-scanning.md) — pre-ingest scanner behavior, baseline false-positive workflow, override confirmation, and audit queries. -- [v1.0 release governance](./v1.0-release-governance.md) — maintainer steps +- [Guidance](./guidance.md) — authoring guidance sheets with the `clarion + guidance` CLI, `--match`/`--scope-level`/`--expires` semantics, staleness + findings, and the export/import team-sharing workflow. +- [Release governance](./v1.0-release-governance.md) — maintainer steps for GitHub branch/ruleset enforcement, Actions policy, release dry run, and final tag gating. - [Federation contracts](../federation/contracts.md) — read-side HTTP diff --git a/docs/operator/clarion-http-read-api.md b/docs/operator/clarion-http-read-api.md index a51ffc58..d333b068 100644 --- a/docs/operator/clarion-http-read-api.md +++ b/docs/operator/clarion-http-read-api.md @@ -23,18 +23,23 @@ serve: When `identity_token_env` is configured, Clarion refuses to start unless the env var is present and non-empty. Protected `/api/v1/files` routes then require -`X-Loom-Component: clarion:`. The HMAC is lowercase hex HMAC-SHA256 over: +`X-Loom-Component: clarion:`, `X-Loom-Timestamp: `, and +`X-Loom-Nonce: `. The HMAC is lowercase hex HMAC-SHA256 over: ```text + + ``` For example, a GET of `/api/v1/files?path=demo.py&language=python` signs the method `GET`, that exact path-and-query string, and the SHA-256 hash of an empty -body. `GET /api/v1/_capabilities` stays unauthenticated so siblings can probe -the API surface before sending protected reads. +body, followed by the timestamp and nonce header values. Clarion accepts a +five-minute timestamp skew and rejects reuse of the same nonce inside that +process-local window. `GET /api/v1/_capabilities` stays unauthenticated so +siblings can probe the API surface before sending protected reads. Clarion still accepts the older `serve.http.token_env` bearer-token path for compatibility. Prefer `identity_token_env` for new deployments. diff --git a/docs/operator/getting-started.md b/docs/operator/getting-started.md index 248fc028..0fe6349b 100644 --- a/docs/operator/getting-started.md +++ b/docs/operator/getting-started.md @@ -25,7 +25,7 @@ If a step fails, see [Troubleshooting](#troubleshooting) at the end. | An MCP client | any MCP-speaking client | see [§3](#3-serve) | The Python plugin will fail at runtime if `pyright-langserver` is not on -`$PATH` at the pinned version (1.1.409 in v1.0). Install via +`$PATH` at the pinned version (currently 1.1.409). Install via `npm install -g pyright@1.1.409` or `pipx install pyright==1.1.409`. ### Required environment variables @@ -36,31 +36,26 @@ For step 4's `summary` question you need an OpenRouter API key: export OPENROUTER_API_KEY=sk-or-v1-... ``` -`clarion analyze` (step 2) and the structural MCP tools (`entity_at`, -`find_entity`, `callers_of`, `execution_paths_from`, `issues_for`, -`neighborhood`, `subsystem_members`, `subsystem_of`, `project_status`, -`summary_preview_cost`, `source_for_entity`, `call_sites`, `orientation_pack`, -`analyze_start`, `analyze_status`, `analyze_cancel`, `index_diff`) work without -any LLM credentials — seventeen of the eighteen MCP tools are credential-free. -The key is only consulted when an MCP client calls `summary(id)` against an entity that does not -yet have a cached summary. +`clarion analyze` (step 2) and the structural MCP tools work without any LLM +credentials. The key is only consulted when an MCP client calls `summary(id)` +against an entity that does not yet have a cached summary. ## 1. Install -Tagged v1.0 releases ship a platform archive for the Rust binary and a Python -sdist for the language plugin via GitHub Releases (per -[ADR-033](../clarion/adr/ADR-033-v1.0-distribution.md)). Until the first tag -fires, use the source-install fallback below. +Tagged releases ship a platform archive for the Rust binary and a Python sdist +for the language plugin via GitHub Releases (per +[ADR-033](../clarion/adr/ADR-033-v1.0-distribution.md)). Use the source-install +fallback below only when testing unreleased commits. ```bash -TAG=v1.0.0 +TAG=v1.2.0 curl -L -o clarion-x86_64-unknown-linux-gnu.tar.gz \ "https://github.com/tachyon-beep/clarion/releases/download/${TAG}/clarion-x86_64-unknown-linux-gnu.tar.gz" tar xzf clarion-x86_64-unknown-linux-gnu.tar.gz install clarion-x86_64-unknown-linux-gnu/clarion ~/.local/bin/ pipx install \ - "https://github.com/tachyon-beep/clarion/releases/download/${TAG}/clarion-plugin-python-1.0.0.tar.gz" + "https://github.com/tachyon-beep/clarion/releases/download/${TAG}/clarion-plugin-python-1.2.0.tar.gz" ``` Source-install fallback: @@ -100,7 +95,7 @@ slsa-verifier verify-artifact \ clarion-x86_64-unknown-linux-gnu.tar.gz ``` -The v1.0 release deliberately does not publish to PyPI or crates.io. GitHub +The current 1.x release line deliberately does not publish to PyPI or crates.io. GitHub Release assets are the source of truth until public registries are introduced by a later ADR. diff --git a/docs/operator/guidance.md b/docs/operator/guidance.md new file mode 100644 index 00000000..e9dcb23d --- /dev/null +++ b/docs/operator/guidance.md @@ -0,0 +1,219 @@ +# Guidance Operator Notes + +A **guidance sheet** is institutional knowledge attached to code: a short note +("refresh tokens are single-use", "this module owns the retry budget") that +Clarion serves to consult-mode agents alongside the entities it applies to. A +sheet is a first-class entity of `kind: guidance` (`id` form +`core:guidance:`); it carries the note text plus the rules that decide +which entities it covers, a scope level, and optional pinning / expiry +(`REQ-GUIDANCE-01`, ADR-024). + +Guidance is authored by operators via the `clarion guidance` CLI (this guide) +or proposed by agents through MCP and promoted by an operator. Authored and +promoted sheets reach consult agents through the `guidance_for` MCP read tool +and are also composed into auto-generated `summary` prompts with a real +`guidance_fingerprint` cache key. + +All subcommands operate on `.clarion/clarion.db`, so **run `clarion analyze` +first** — the CLI errors if the database is absent. + +## Authoring workflow (`REQ-GUIDANCE-03`) + +### `create` + +```bash +clarion guidance create \ + --match path:src/auth/** \ + --match tag:auth \ + --scope-level module \ + --name auth-tokens \ + --content "Refresh tokens are single-use; rotate on every refresh." +``` + +Omit `--content` to author the note in `$EDITOR`/`$VISUAL` (or pipe it on +stdin). Useful flags: + +- `--pinned` — mark the sheet preserved under token-budget pressure during + composition. +- `--expires ` — see [Expiry](#expiry-semantics) below. +- `--name ` — the `core:guidance:` id segment. Defaults to a slug + derived from the first `--match` rule. + +`create` refuses to overwrite an existing id; use `edit` to change a sheet. + +### `edit ` + +Opens the sheet's **content** in `$EDITOR`/`$VISUAL`. Only `content` changes; +every other property — including `authored_at` (the staleness baseline), +`provenance`, `pinned`, `expires`, `scope_level`, and `match_rules` — is +preserved. + +### `show ` / `list` + +```bash +clarion guidance show core:guidance:auth-tokens +clarion guidance list +clarion guidance list --for-entity python:function:auth.tokens.refresh +``` + +`list` is ordered by `scope_rank` (project → function). `--for-entity` filters +to sheets whose `match_rules` apply to that entity id. See +[Staleness](#staleness) for `--stale` / `--expired`. + +### `delete ` + +Removes the sheet. Its matched entities' cached summaries are invalidated (see +[Cache behaviour](#cache-behaviour)). + +### `promote ` + +```bash +clarion guidance promote clarion-obs-abc123 +``` + +Promotes a reviewed Filigree observation produced by MCP `propose_guidance` +into a local guidance sheet (`provenance: filigree_promotion`). Arbitrary +observations are rejected: the observation detail must contain Clarion's +guidance-proposal payload. This is the anti-poisoning boundary (`NFR-SEC-02`): +an agent proposal is inert until an operator promotes it. + +MCP also exposes the same lifecycle: + +- `propose_guidance(entity_id, content, scope_level?, match_rules?, name?, + pinned?, expires?)` creates a Filigree observation, not a sheet. +- `promote_guidance(observation_id)` consumes a reviewed observation and writes + the local sheet. + +## `--match` rules + +Each `--match` value is `:`, split on the **first** colon only +(subsystem and entity values contain colons of their own). Repeat `--match` to +add several rules; a sheet matches an entity if any rule matches. + +| Rule | Matches | +|---|---| +| `path:` | entities whose source path matches the glob (e.g. `path:src/auth/**`) | +| `tag:` | entities carrying the categorisation tag | +| `kind:` | entities of a kind (`function`, `class`, `module`, …) | +| `subsystem:` | members of a subsystem (e.g. `subsystem:core:subsystem:abcd`) | +| `entity:` | one specific entity (e.g. `entity:python:function:auth.tokens.refresh`) | + +## `--scope-level` + +One of `project | subsystem | package | module | class | function` (ADR-024). +Scope level drives the **composition order**: when several sheets apply to one +entity, `guidance_for` ranks them by `scope_rank` ascending — project-scoped +sheets first, function-scoped last — so narrower, more specific guidance is +ordered after (and can override) broader guidance. Within a scope level, ties +break by `authored_at` then id. + +## Expiry semantics + +`--expires` accepts: + +- a full ISO-8601 instant (`2026-12-31T23:59:59Z`); +- an offset form (`2026-06-03T12:00:00+02:00`), converted to UTC; or +- a bare date (`2026-12-31`), taken as **start-of-day UTC** + (`2026-12-31T00:00:00.000Z`). + +The value is normalized to a full UTC instant before storage so the read path's +lexical expiry compare is correct. Unparseable input is rejected at create time. +The read path excludes expired sheets from composition; analyze also surfaces +them as a finding (below). + +## Wardline-derived guidance (`REQ-GUIDANCE-04`) + +When `wardline.yaml` is present, `clarion analyze` generates deterministic, +pinned guidance sheets from the Wardline bundle: + +- `core:guidance:wardline-tier-` +- `core:guidance:wardline-boundary-` +- `core:guidance:wardline-annotation-group-` + +The parser accepts real Wardline output (`tiers: [...]`, `module_tiers: [...]`, +`wardline.fingerprint.json`, `wardline.exceptions.json`, and +`**/wardline.overlay.yaml`) plus the earlier guidance-map shape with `paths`, +optional `content`, optional `scope_level`, and optional explicit +`match_rules`. The bundle hash folds in the root manifest, fingerprint baseline, +exceptions register, and overlay boundary files, so drift in any governance +artifact can make preserved overrides reviewable. Generated sheets carry +`provenance: wardline_derived`, `pinned: true`, `wardline_manifest_hash`, +artifact hash/count metadata, and a generated-signature guard. + +If an operator edits a generated sheet, the next analyze preserves the edit and +marks the sheet `provenance: wardline_derived_overridden`. If the Wardline +bundle changes while an override remains in place, analyze emits +`CLA-FACT-GUIDANCE-STALE` so the override can be reviewed instead of silently +overwritten. + +## Staleness + +Two independent staleness signals exist, and they are **not** the same thing: + +1. **Age / review cadence** — `clarion guidance list --stale [--days N]` shows + sheets not touched (the later of `reviewed_at` / `authored_at`) within `N` + days (default 90). This is a review-cadence prompt, computed at list time. +2. **Churn-based finding** — `clarion analyze` emits + `CLA-FACT-GUIDANCE-CHURN-STALE` when the code under a sheet has churned (see + the findings table). This is a separate heuristic, not the `--days` age + signal. + +`--stale` and `--expired` are independent filters that compose by intersection +(AND): `clarion guidance list --stale --expired` shows sheets that are both. + +### Staleness findings (`REQ-GUIDANCE-05`) + +`clarion analyze` persists these findings over the committed graph (anchored to +the guidance sheet). See `detailed-design.md` §5 for the canonical catalogue. + +| Rule | Severity | When | +|---|---|---| +| `CLA-FACT-GUIDANCE-ORPHAN` | WARN | The sheet's `guides` edge **or** a `match_rules` `entity:` rule points at an entity deleted between runs. The sheet's guidance is stranded. | +| `CLA-FACT-GUIDANCE-EXPIRED` | INFO | The sheet's `expires` instant is in the past. The read path already excludes it from composition; this surfaces the state operatively (the sheet is not deleted). | +| `CLA-FACT-GUIDANCE-STALE` | WARN | A Wardline-derived override carries an older `wardline_manifest_hash` than the current Wardline bundle. | +| `CLA-FACT-GUIDANCE-CHURN-STALE` | WARN (confidence 0.7) | The aggregate `git_churn_count` over the sheet's matched entities meets the staleness threshold (50; 20 for `pinned: true` sheets). | + +> **`CLA-FACT-GUIDANCE-CHURN-STALE` is currently inert.** It is emitted only +> when churn data is available, and the analyze pipeline does not yet populate +> `git_churn_count`. In production today it never fires. The other guidance +> findings are live. + +## Team sharing: export / import (`REQ-GUIDANCE-06`) + +Guidance is committable team knowledge. Export writes one deterministic, +sorted-key JSON file per sheet (byte-stable across runs on identical DB state, +diff-friendly): + +```bash +clarion guidance export --to ./shared/guidance # --to takes a flag +clarion guidance import ./shared/guidance # dir is positional +``` + +Import is **additive and idempotent**: each sheet is upserted by id, ids are +preserved exactly, and local sheets not present in the directory are left +untouched. Re-importing the same directory changes nothing. A malformed `*.json` +aborts the whole import naming the offending file (a silently-dropped sheet +would be data loss). + +> **Export does not prune.** A sheet deleted locally still has its file in the +> export directory, so a teammate's additive `import` will resurrect it. To +> mirror local state exactly (rather than merge into it), **clear the export +> directory before exporting**. + +## Cache behaviour + +Authoring — `create`, `edit`, `delete`, `promote`, `import`, and Wardline +regeneration — invalidates the cached summaries of the entities the affected +sheet's `match_rules` cover (ADR-007 churn-eager invalidation). Without this, +new or changed guidance would stay inert until each matched entity's code next +changed. Over-invalidation is safe; the CLI prints how many summaries it +dropped. + +## Not yet available + +These pieces of the guidance system are **deferred** and do not ship today. +Authored guidance reaches consult agents through both `guidance_for` and +auto-generated summaries, but the following are not yet wired: + +- **In-browser staleness-review UI (`NG-13`)** — deferred. Ticket: + clarion-0d7e22c6cb. diff --git a/plugins/python/README.md b/plugins/python/README.md index 69b9b53a..09d345f6 100644 --- a/plugins/python/README.md +++ b/plugins/python/README.md @@ -4,9 +4,10 @@ The Python language plugin for [Clarion](../../README.md). Extracts Python entities from source files and serves them to the Clarion core over the JSON-RPC protocol defined in [WP2 L4](../../docs/implementation/sprint-1/wp2-plugin-host.md#l4--json-rpc-method-set--content-length-framing). -**Status**: Sprint 1 walking-skeleton baseline. Functions only (module-level -and class methods). Classes, decorators, imports, and call graphs are -WP3-feature-complete scope. +**Status**: Python structural extractor. It emits modules, classes, functions, +`contains`, `calls`, `references`, `imports`, and versioned entity signatures +for Stable Entity Identity (SEI) matching. Wardline semantic enrichment is not +advertised until the plugin emits real Wardline-derived signals. ## Install (development) @@ -35,8 +36,8 @@ CI runs the same four gates in the `python-plugin` job. ## Design references - [WP3 plan](../../docs/implementation/sprint-1/wp3-python-plugin.md) — task - ledger, lock-ins (L7 qualname, L8 Wardline probe), UQ resolutions. -- [ADR-003](../../docs/clarion/adr/ADR-003-entity-id-format.md) — 3-segment + ledger, lock-ins, and UQ resolutions. +- [ADR-003](../../docs/clarion/adr/ADR-003-entity-id-scheme.md) — 3-segment `EntityId` format this plugin produces. - [ADR-018](../../docs/clarion/adr/ADR-018-identity-reconciliation.md) — cross-product identity join with Wardline. diff --git a/plugins/python/plugin.toml b/plugins/python/plugin.toml index ea57e6ed..a9ebcf0c 100644 --- a/plugins/python/plugin.toml +++ b/plugins/python/plugin.toml @@ -1,7 +1,7 @@ [plugin] name = "clarion-plugin-python" plugin_id = "python" -version = "1.1.0" +version = "1.2.0" protocol_version = "1.0" # Bare basename per ADR-021 §Layer 1 + WP2 scrub commit eb0a41d — the host # refuses manifests whose `executable` carries any path component. @@ -19,9 +19,10 @@ expected_max_rss_mb = 2048 # core cap (warning emission itself is deferred to Tier B — Sprint 1 # only lands the declaration). expected_entities_per_file = 5000 -# L8 integration point — the plugin probes wardline at initialize and -# reports the outcome in the handshake's capabilities field. -wardline_aware = true +# Wardline semantic extraction is not implemented in this plugin yet. Keep this +# false until the plugin emits usable Wardline-derived signals, not just a +# package/version probe. +wardline_aware = false # v0.1 rejects `true` at initialize with CLA-INFRA-MANIFEST-UNSUPPORTED-CAPABILITY. reads_outside_project_root = false @@ -46,15 +47,6 @@ rule_id_prefix = "CLA-PY-" # the 5-tuple organically (no edges live in the 5-tuple yet anyway). ontology_version = "0.6.0" -[integrations.wardline] -# Verified present in Wardline source (src/wardline/core/registry.py:55, -# src/wardline/__init__.py:3) at sprint close 2026-04-28; current -# Wardline version is 1.0.0. Pin range admits 1.x; 2.0.0 is an exclusive -# upper bound so a future major version triggers an explicit re-pin -# rather than silent drift. -min_version = "1.0.0" -max_version = "2.0.0" - # SEI signature declaration (ADR-038 REQ-C-01 / Wave 1). The plugin emits a # versioned `signature` object per function/class entity, stored verbatim by the # core and compared by string equality as the matcher's move-case input. Modules diff --git a/plugins/python/pyproject.toml b/plugins/python/pyproject.toml index 2977de03..f469278b 100644 --- a/plugins/python/pyproject.toml +++ b/plugins/python/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "hatchling.build" [project] name = "clarion-plugin-python" -version = "1.1.0" +version = "1.2.0" description = "Clarion Python language plugin — v1.0 release" readme = "README.md" requires-python = ">=3.11" @@ -86,6 +86,14 @@ strict = true warn_unused_configs = true files = ["src", "tests"] +[[tool.mypy.overrides]] +module = "yaml" +ignore_missing_imports = true + +[[tool.mypy.overrides]] +module = "clarion_plugin_python.wardline_probe" +warn_unused_ignores = false + [tool.pytest.ini_options] testpaths = ["tests"] addopts = "--strict-markers --cov=clarion_plugin_python --cov-report=term-missing" diff --git a/plugins/python/src/clarion_plugin_python/__init__.py b/plugins/python/src/clarion_plugin_python/__init__.py index b680e55e..645029c3 100644 --- a/plugins/python/src/clarion_plugin_python/__init__.py +++ b/plugins/python/src/clarion_plugin_python/__init__.py @@ -1,3 +1,3 @@ """clarion-plugin-python — Python language plugin for Clarion.""" -__version__ = "1.1.0" +__version__ = "1.2.0" diff --git a/plugins/python/src/clarion_plugin_python/extractor.py b/plugins/python/src/clarion_plugin_python/extractor.py index f8cc65a4..d3dcff31 100644 --- a/plugins/python/src/clarion_plugin_python/extractor.py +++ b/plugins/python/src/clarion_plugin_python/extractor.py @@ -174,6 +174,11 @@ class RawEntity(TypedDict): # modules (the move case abstains — fail closed). Typed top-level field on # the host's RawEntity, not routed through `extra`. signature: NotRequired[FunctionSignature | ClassSignature] + # WS5b catalogue/reachability categorisations. Typed top-level because the + # core denormalises these into `entity_tags`; unknown/empty means no signal. + tags: NotRequired[list[str]] + # Short natural-language text used by analyze-time semantic embeddings. + docstring: NotRequired[str] class RawEdge(TypedDict): @@ -197,6 +202,8 @@ class ImportsEdgeProperties(TypedDict): imported_name: str import_style: Literal["import", "from_import"] level: int + type_only: NotRequired[bool] + scope: NotRequired[Literal["function"]] @dataclass @@ -297,9 +304,10 @@ def _build_module_entity( dotted_module: str, file_path: str, parse_status: Literal["ok", "syntax_error"], + docstring: str | None = None, ) -> RawEntity: """Build the per-file module entity (Q1 + Q4 resolutions).""" - return { + entity: RawEntity = { "id": entity_id(_PLUGIN_ID, "module", dotted_module), "kind": "module", "qualified_name": dotted_module, @@ -309,6 +317,8 @@ def _build_module_entity( }, "parse_status": parse_status, } + _attach_optional_entity_metadata(entity, docstring=docstring, tags=[]) + return entity def extract( @@ -393,11 +403,17 @@ def extract_with_stats( ) parse_latency_ms = _elapsed_ms(parse_started_ns) - module_entity = _build_module_entity(source, dotted_module, file_path, "ok") + module_entity = _build_module_entity( + source, dotted_module, file_path, "ok", ast.get_docstring(tree) + ) entities: list[RawEntity] = [module_entity] edges: list[RawEdge] = [] function_ids: list[str] = [] - walk_state = _WalkState(seen_ids={module_entity["id"]}, file_path=file_path) + walk_state = _WalkState( + seen_ids={module_entity["id"]}, + file_path=file_path, + exported_names=_module_export_names(tree), + ) _walk( tree, [tree], @@ -465,6 +481,35 @@ def __init__( self.module_entity_id = module_entity_id self.is_package_module = is_package_module self.edges: list[RawEdge] = [] + self._function_depth = 0 + self._type_only_depth = 0 + + def visit_FunctionDef(self, node: ast.FunctionDef) -> None: + self._function_depth += 1 + try: + self.generic_visit(node) + finally: + self._function_depth -= 1 + + def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None: + self._function_depth += 1 + try: + self.generic_visit(node) + finally: + self._function_depth -= 1 + + def visit_If(self, node: ast.If) -> None: + if _is_type_checking_guard(node.test): + self._type_only_depth += 1 + try: + for child in node.body: + self.visit(child) + finally: + self._type_only_depth -= 1 + for child in node.orelse: + self.visit(child) + return + self.generic_visit(node) def visit_Import(self, node: ast.Import) -> None: source_byte_start, source_byte_end = _node_byte_range(self.source, node) @@ -511,6 +556,15 @@ def _edge( source_range: tuple[int, int], ) -> RawEdge: source_byte_start, source_byte_end = source_range + properties: ImportsEdgeProperties = { + "imported_name": imported_name, + "import_style": import_style, + "level": level, + } + if self._type_only_depth > 0: + properties["type_only"] = True + if self._function_depth > 0: + properties["scope"] = "function" return { "kind": "imports", "from_id": self.module_entity_id, @@ -518,14 +572,20 @@ def _edge( "source_byte_start": source_byte_start, "source_byte_end": source_byte_end, "confidence": "resolved", - "properties": { - "imported_name": imported_name, - "import_style": import_style, - "level": level, - }, + "properties": properties, } +def _is_type_checking_guard(expr: ast.expr) -> bool: + if isinstance(expr, ast.Name): + return expr.id == "TYPE_CHECKING" + if isinstance(expr, ast.Attribute): + return expr.attr == "TYPE_CHECKING" + if isinstance(expr, ast.BoolOp): + return any(_is_type_checking_guard(value) for value in expr.values) + return False + + def _import_from_target( dotted_module: str, module: str | None, @@ -664,9 +724,12 @@ def visit_Call(self, node: ast.Call) -> None: self.visit(keyword.value) def visit_Name(self, node: ast.Name) -> None: - if isinstance(node.ctx, ast.Load) and node.id not in self.bound_stack[-1]: + if isinstance(node.ctx, ast.Load) and not self._is_non_entity_local(node.id): self.sites.append(self._site_for_name(node)) + def _is_non_entity_local(self, name: str) -> bool: + return any(name in scope for scope in self.bound_stack[1:]) + def _visit_function_signature(self, node: ast.FunctionDef | ast.AsyncFunctionDef) -> None: for arg in [ *node.args.posonlyargs, @@ -793,6 +856,7 @@ class _WalkState: seen_ids: set[str] file_path: str + exported_names: set[str] = field(default_factory=set) duplicate_entities_dropped: int = 0 @@ -856,7 +920,11 @@ def _walk( # noqa: PLR0913 - recursive walker needs both accumulators + parent if _has_overload_decorator(child): continue entity, child_id = _build_function_entity( - child, parents, dotted_module, file_path, parent_entity_id + child, + parents, + dotted_module, + parent_entity_id, + state, ) if child_id in state.seen_ids: state.duplicate_entities_dropped += 1 @@ -873,7 +941,11 @@ def _walk( # noqa: PLR0913 - recursive walker needs both accumulators + parent new_parent_id = child_id case ast.ClassDef(): entity, child_id = _build_class_entity( - child, parents, dotted_module, file_path, parent_entity_id + child, + parents, + dotted_module, + parent_entity_id, + state, ) if child_id in state.seen_ids: state.duplicate_entities_dropped += 1 @@ -939,6 +1011,114 @@ def _contains_edge(parent_id: str, child_id: str) -> RawEdge: } +_HTTP_ROUTE_DECORATOR_NAMES = { + "get", + "post", + "put", + "patch", + "delete", + "options", + "head", + "route", + "websocket", +} +_CLI_DECORATOR_NAMES = {"command", "group", "callback"} +_DATA_MODEL_BASE_NAMES = {"BaseModel", "Model", "SQLModel", "TypedDict"} + + +def _attach_optional_entity_metadata( + entity: RawEntity, + *, + docstring: str | None, + tags: set[str] | list[str], +) -> None: + if docstring: + entity["docstring"] = docstring + if tags: + entity["tags"] = sorted(tags) + + +def _module_export_names(tree: ast.Module) -> set[str]: + exported: set[str] = set() + for statement in tree.body: + if not isinstance(statement, ast.Assign): + continue + if not any( + isinstance(target, ast.Name) and target.id == "__all__" for target in statement.targets + ): + continue + match statement.value: + case ast.List(elts=elts) | ast.Tuple(elts=elts) | ast.Set(elts=elts): + for elt in elts: + if isinstance(elt, ast.Constant) and isinstance(elt.value, str): + exported.add(elt.value) + return exported + + +def _expr_qualified_name(expr: ast.expr) -> str | None: + match expr: + case ast.Call(func=func): + return _expr_qualified_name(func) + case ast.Name(id=name): + return name + case ast.Attribute(value=value, attr=attr): + base = _expr_qualified_name(value) + return f"{base}.{attr}" if base else attr + case _: + return None + + +def _decorator_names( + node: ast.FunctionDef | ast.AsyncFunctionDef | ast.ClassDef, +) -> list[str]: + return [name for decorator in node.decorator_list if (name := _expr_qualified_name(decorator))] + + +def _last_name(name: str) -> str: + return name.rsplit(".", 1)[-1] + + +def _is_module_level(parents: list[ast.AST]) -> bool: + return len(parents) == 1 + + +def _function_tags( + node: ast.FunctionDef | ast.AsyncFunctionDef, + parents: list[ast.AST], + exported_names: set[str], +) -> set[str]: + tags: set[str] = set() + if _is_module_level(parents) and node.name == "main": + tags.add("entry-point") + if _is_module_level(parents) and node.name in exported_names: + tags.add("exported-api") + if node.name.startswith("test_") or any( + isinstance(parent, ast.ClassDef) and parent.name.startswith("Test") for parent in parents + ): + tags.add("test") + decorator_names = _decorator_names(node) + if any(_last_name(name) in _HTTP_ROUTE_DECORATOR_NAMES for name in decorator_names): + tags.update({"http-route", "framework-handler"}) + if any(_last_name(name) in _CLI_DECORATOR_NAMES for name in decorator_names): + tags.update({"cli-command", "framework-handler"}) + return tags + + +def _class_tags(node: ast.ClassDef, parents: list[ast.AST], exported_names: set[str]) -> set[str]: + tags: set[str] = set() + if _is_module_level(parents) and node.name in exported_names: + tags.add("exported-api") + if node.name.startswith("Test"): + tags.add("test") + decorator_names = _decorator_names(node) + base_names = [_expr_qualified_name(base) for base in node.bases] + if any(_last_name(name) == "dataclass" for name in decorator_names) or any( + name is not None and _last_name(name) in _DATA_MODEL_BASE_NAMES for name in base_names + ): + tags.add("data-model") + return tags + + def _annotation_str(node: ast.expr | None) -> str | None: """Unparse an annotation/expression node to its canonical source text, or ``None`` when absent. ``ast.unparse`` is deterministic for a given AST.""" @@ -983,8 +1163,8 @@ def _build_function_entity( node: ast.FunctionDef | ast.AsyncFunctionDef, parents: list[ast.AST], dotted_module: str, - file_path: str, parent_entity_id: str, + state: _WalkState, ) -> tuple[RawEntity, str]: python_qualname = reconstruct_qualname(node, parents) qualified_name = f"{dotted_module}.{python_qualname}" if dotted_module else python_qualname @@ -997,7 +1177,7 @@ def _build_function_entity( "kind": "function", "qualified_name": qualified_name, "source": { - "file_path": file_path, + "file_path": state.file_path, "source_range": { "start_line": start_line, "start_col": start_col, @@ -1009,6 +1189,11 @@ def _build_function_entity( "definition": definition, "signature": _function_signature(node), } + _attach_optional_entity_metadata( + entity, + docstring=ast.get_docstring(node), + tags=_function_tags(node, parents, state.exported_names), + ) return entity, child_id @@ -1016,8 +1201,8 @@ def _build_class_entity( node: ast.ClassDef, parents: list[ast.AST], dotted_module: str, - file_path: str, parent_entity_id: str, + state: _WalkState, ) -> tuple[RawEntity, str]: """Build a class entity. Uses real ast.end_lineno/end_col_offset (not the module sentinel). @@ -1037,7 +1222,7 @@ def _build_class_entity( "kind": "class", "qualified_name": qualified_name, "source": { - "file_path": file_path, + "file_path": state.file_path, "source_range": { "start_line": start_line, "start_col": start_col, @@ -1049,4 +1234,9 @@ def _build_class_entity( "definition": definition, "signature": _class_signature(node), } + _attach_optional_entity_metadata( + entity, + docstring=ast.get_docstring(node), + tags=_class_tags(node, parents, state.exported_names), + ) return entity, child_id diff --git a/plugins/python/src/clarion_plugin_python/pyright_session.py b/plugins/python/src/clarion_plugin_python/pyright_session.py index 3e6ca295..e8f50f49 100644 --- a/plugins/python/src/clarion_plugin_python/pyright_session.py +++ b/plugins/python/src/clarion_plugin_python/pyright_session.py @@ -1,18 +1,21 @@ from __future__ import annotations import ast +import ctypes +import ctypes.util import json import math import os import select import shutil +import signal import subprocess import sys import threading import time from dataclasses import dataclass from pathlib import Path -from typing import IO, TYPE_CHECKING, Any, Self +from typing import IO, TYPE_CHECKING, Any, Literal, Self from urllib.parse import unquote, urlparse from clarion_plugin_python import __version__ @@ -134,6 +137,7 @@ class _FunctionIndex: functions: tuple[_FunctionInfo, ...] entities: tuple[_EntityInfo, ...] tree: ast.Module + parse_status: Literal["ok", "syntax_error"] = "ok" @dataclass @@ -224,6 +228,12 @@ def resolve_calls( ) -> CallResolutionResult: path = Path(file_path).resolve() index = self._function_index_for_path(path) + if index.parse_status == "syntax_error": + return CallResolutionResult( + unresolved_call_sites_total=len(function_ids), + pyright_index_parse_latency_ms=self._pop_index_parse_latencies(), + findings=self._pop_findings(), + ) requested = [ index.by_id[function_id] for function_id in function_ids if function_id in index.by_id ] @@ -283,6 +293,13 @@ def resolve_references( path = Path(file_path).resolve() index = self._function_index_for_path(path) reference_sites_total = len(sites) + if index.parse_status == "syntax_error": + return ReferenceResolutionResult( + reference_sites_total=reference_sites_total, + unresolved_reference_sites_total=reference_sites_total, + pyright_index_parse_latency_ms=self._pop_index_parse_latencies(), + findings=self._pop_findings(), + ) if not sites: return ReferenceResolutionResult( pyright_index_parse_latency_ms=self._pop_index_parse_latencies(), @@ -404,7 +421,7 @@ def _resolve_with_pyright( continue for from_range in from_ranges: key = _range_key(from_range) - if key is not None: + if key is not None and _range_within_function(key, function): grouped.setdefault(key, set()).add(to_id) for range_key, candidates in _ambiguous_dict_dispatches(index, function).items(): @@ -599,6 +616,8 @@ def _target_id_from_location(self, location: object) -> tuple[str | None, bool]: if not self._is_internal_project_path(target_path): return None, True target_index = self._function_index_for_path(target_path) + if target_index.parse_status == "syntax_error": + return None, False key = _range_start_key(raw_range) if key is not None and key in target_index.entity_by_name_position: return target_index.entity_by_name_position[key], False @@ -654,6 +673,29 @@ def _start_process(self) -> bool: ) return False + preexec_fn = None + if sys.platform == "linux": + libc_name = ctypes.util.find_library("c") + libc = None + if libc_name is not None: + try: # noqa: SIM105 + libc = ctypes.CDLL(libc_name, use_errno=True) + except Exception: # noqa: BLE001, S110 + pass + + if libc is not None: + + def set_pdeathsig() -> None: + try: + # PR_SET_PDEATHSIG is 1 + libc.prctl(1, signal.SIGTERM, 0, 0, 0) + if os.getppid() == 1: + os._exit(0) + except Exception: # noqa: BLE001, S110 + pass + + preexec_fn = set_pdeathsig + try: process = subprocess.Popen( # noqa: S603 - executable path comes from manifest/PATH. [executable, "--stdio"], @@ -662,6 +704,7 @@ def _start_process(self) -> bool: stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, + preexec_fn=preexec_fn, # noqa: PLW1509 ) except OSError as exc: self._run_state.disabled = True @@ -841,11 +884,11 @@ def _read_message(self, timeout_secs: float) -> dict[str, Any]: line = _read_line(fd, deadline) if line in (b"\r\n", b"\n"): break - if b":" not in line: - message = f"malformed LSP header: {line!r}" - raise LspTransportClosedError(message) - name, value = line.decode("ascii").strip().split(":", 1) - headers[name.lower()] = value.strip() + decoded_line = line.decode("ascii", errors="ignore").strip() + name, sep, value = decoded_line.partition(":") + if not sep: + continue + headers[name.strip().lower()] = value.strip() if "content-length" not in headers: message = f"missing LSP Content-Length header: {headers!r}" raise LspTransportClosedError(message) @@ -868,6 +911,8 @@ def _target_id_from_call(self, call: dict[object, object]) -> str | None: if not self._is_internal_project_path(target_path): return None index = self._function_index_for_path(target_path) + if index.parse_status == "syntax_error": + return None key = _range_start_key(raw_selection) if key is not None and key in index.by_name_position: return index.by_name_position[key].entity_id @@ -915,12 +960,17 @@ def _build_function_index(project_root: Path, path: Path, source: str) -> _Funct relative = path.relative_to(project_root) if path.is_relative_to(project_root) else path dotted_module = module_dotted_name(relative.as_posix()) parse_started = time.perf_counter() - tree = ast.parse(source) + parse_status: Literal["ok", "syntax_error"] = "ok" + try: + tree = ast.parse(source) + except SyntaxError: + tree = ast.Module(body=[], type_ignores=[]) + parse_status = "syntax_error" parse_latency_ms = max(1, math.ceil((time.perf_counter() - parse_started) * 1000)) functions: list[_FunctionInfo] = [] entities: list[_EntityInfo] = [] source_lines = source.splitlines() - _collect_entities(tree, [tree], dotted_module, source_lines, functions, entities) + _collect_entities(tree, [tree], dotted_module, source_lines, functions, entities, set()) line_starts = _line_starts(source) module_id = entity_id("python", "module", dotted_module) by_id = {function.entity_id: function for function in functions} @@ -943,6 +993,7 @@ def _build_function_index(project_root: Path, path: Path, source: str) -> _Funct functions=tuple(functions), entities=tuple(entities), tree=tree, + parse_status=parse_status, ) @@ -953,20 +1004,28 @@ def _collect_entities( # noqa: PLR0913 - keeps function/class indexes in one tr source_lines: list[str], out: list[_FunctionInfo], out_entities: list[_EntityInfo], + seen_ids: set[str], ) -> None: for child in ast.iter_child_nodes(node): match child: case ast.FunctionDef() | ast.AsyncFunctionDef(): python_qualname = reconstruct_qualname(child, parents) qualified_name = f"{dotted_module}.{python_qualname}" + child_id = entity_id("python", "function", qualified_name) + if child_id in seen_ids: + continue + seen_ids.add(child_id) line_text = ( source_lines[child.lineno - 1] if child.lineno <= len(source_lines) else "" ) - character = line_text.find(child.name) - if character < 0: - character = child.col_offset + name_character = line_text.find(child.name) + character = ( + _codepoint_col_to_utf16(line_text, name_character) + if name_character >= 0 + else _byte_col_to_utf16(line_text, child.col_offset) + ) entity = _EntityInfo( - entity_id=entity_id("python", "function", qualified_name), + entity_id=child_id, line=child.lineno - 1, character=character, ) @@ -979,8 +1038,12 @@ def _collect_entities( # noqa: PLR0913 - keeps function/class indexes in one tr line=child.lineno - 1, character=character, end_line=(child.end_lineno or child.lineno) - 1, - end_character=child.end_col_offset or child.col_offset, - call_sites=tuple(_function_call_sites(child)), + end_character=_ast_position_to_lsp( + source_lines, + (child.end_lineno or child.lineno) - 1, + child.end_col_offset or child.col_offset, + ), + call_sites=tuple(_function_call_sites(child, source_lines)), node=child, ), ) @@ -991,19 +1054,27 @@ def _collect_entities( # noqa: PLR0913 - keeps function/class indexes in one tr source_lines, out, out_entities, + seen_ids, ) case ast.ClassDef(): python_qualname = reconstruct_qualname(child, parents) qualified_name = f"{dotted_module}.{python_qualname}" + child_id = entity_id("python", "class", qualified_name) + if child_id in seen_ids: + continue + seen_ids.add(child_id) line_text = ( source_lines[child.lineno - 1] if child.lineno <= len(source_lines) else "" ) - character = line_text.find(child.name) - if character < 0: - character = child.col_offset + name_character = line_text.find(child.name) + character = ( + _codepoint_col_to_utf16(line_text, name_character) + if name_character >= 0 + else _byte_col_to_utf16(line_text, child.col_offset) + ) out_entities.append( _EntityInfo( - entity_id=entity_id("python", "class", qualified_name), + entity_id=child_id, line=child.lineno - 1, character=character, ), @@ -1015,6 +1086,7 @@ def _collect_entities( # noqa: PLR0913 - keeps function/class indexes in one tr source_lines, out, out_entities, + seen_ids, ) case _: _collect_entities( @@ -1024,6 +1096,7 @@ def _collect_entities( # noqa: PLR0913 - keeps function/class indexes in one tr source_lines, out, out_entities, + seen_ids, ) @@ -1093,8 +1166,11 @@ def _reference_accumulator_to_edge( return edge -def _function_call_sites(node: ast.FunctionDef | ast.AsyncFunctionDef) -> list[_CallSite]: - visitor = _CallSiteVisitor() +def _function_call_sites( + node: ast.FunctionDef | ast.AsyncFunctionDef, + source_lines: Sequence[str], +) -> list[_CallSite]: + visitor = _CallSiteVisitor(source_lines) for statement in node.body: visitor.visit(statement) return visitor.call_sites @@ -1149,7 +1225,8 @@ def _unresolved_call_sites_for_function( class _CallSiteVisitor(ast.NodeVisitor): - def __init__(self) -> None: + def __init__(self, source_lines: Sequence[str]) -> None: + self.source_lines = source_lines self.call_sites: list[_CallSite] = [] def visit_Call(self, node: ast.Call) -> None: @@ -1158,9 +1235,17 @@ def visit_Call(self, node: ast.Call) -> None: self.call_sites.append( _CallSite( func.lineno - 1, - func.col_offset, + _ast_position_to_lsp( + self.source_lines, + func.lineno - 1, + func.col_offset, + ), (func.end_lineno or func.lineno) - 1, - func.end_col_offset or func.col_offset, + _ast_position_to_lsp( + self.source_lines, + (func.end_lineno or func.lineno) - 1, + func.end_col_offset or func.col_offset, + ), callee_expr, ), ) @@ -1183,7 +1268,7 @@ def _ambiguous_dict_dispatches( candidate_maps = _callable_dict_maps(index, function.node) if not candidate_maps: return {} - visitor = _DictDispatchVisitor(candidate_maps) + visitor = _DictDispatchVisitor(candidate_maps, index.source.splitlines()) for statement in function.node.body: visitor.visit(statement) return visitor.dispatches @@ -1195,7 +1280,10 @@ def _dunder_call_dispatches( ) -> dict[tuple[int, int, int, int], set[str]]: if not index.dunder_call_by_class: return {} - visitor = _DunderCallDispatchVisitor(index.dunder_call_by_class) + visitor = _DunderCallDispatchVisitor( + index.dunder_call_by_class, + index.source.splitlines(), + ) for statement in function.node.body: visitor.visit(statement) return visitor.dispatches @@ -1250,8 +1338,13 @@ def _callable_dict_assignment( class _DictDispatchVisitor(ast.NodeVisitor): - def __init__(self, candidate_maps: dict[str, set[str]]) -> None: + def __init__( + self, + candidate_maps: dict[str, set[str]], + source_lines: Sequence[str], + ) -> None: self.candidate_maps = candidate_maps + self.source_lines = source_lines self.dispatches: dict[tuple[int, int, int, int], set[str]] = {} def visit_Call(self, node: ast.Call) -> None: @@ -1263,9 +1356,17 @@ def visit_Call(self, node: ast.Call) -> None: ): key = ( func.lineno - 1, - func.col_offset, + _ast_position_to_lsp( + self.source_lines, + func.lineno - 1, + func.col_offset, + ), (func.end_lineno or func.lineno) - 1, - func.end_col_offset or func.col_offset, + _ast_position_to_lsp( + self.source_lines, + (func.end_lineno or func.lineno) - 1, + func.end_col_offset or func.col_offset, + ), ) self.dispatches[key] = set(self.candidate_maps[func.value.id]) self.generic_visit(node) @@ -1281,8 +1382,13 @@ def visit_ClassDef(self, node: ast.ClassDef) -> None: class _DunderCallDispatchVisitor(ast.NodeVisitor): - def __init__(self, dunder_call_by_class: dict[str, str]) -> None: + def __init__( + self, + dunder_call_by_class: dict[str, str], + source_lines: Sequence[str], + ) -> None: self.dunder_call_by_class = dunder_call_by_class + self.source_lines = source_lines self.instance_targets: dict[str, str] = {} self.dispatches: dict[tuple[int, int, int, int], set[str]] = {} @@ -1304,9 +1410,17 @@ def visit_Call(self, node: ast.Call) -> None: if isinstance(func, ast.Name) and func.id in self.instance_targets: key = ( func.lineno - 1, - func.col_offset, + _ast_position_to_lsp( + self.source_lines, + func.lineno - 1, + func.col_offset, + ), (func.end_lineno or func.lineno) - 1, - func.end_col_offset or func.col_offset, + _ast_position_to_lsp( + self.source_lines, + (func.end_lineno or func.lineno) - 1, + func.end_col_offset or func.col_offset, + ), ) self.dispatches[key] = {self.instance_targets[func.id]} self.generic_visit(node) @@ -1330,12 +1444,51 @@ def _line_starts(source: str) -> tuple[int, ...]: return tuple(starts) +def _utf16_units(text: str) -> int: + return len(text.encode("utf-16-le")) // 2 + + +def _byte_col_to_utf16(line_text: str, byte_col: int) -> int: + line_bytes = line_text.encode("utf-8") + prefix = line_bytes[: max(0, min(byte_col, len(line_bytes)))] + return _utf16_units(prefix.decode("utf-8", errors="ignore")) + + +def _codepoint_col_to_utf16(line_text: str, codepoint_col: int) -> int: + return _utf16_units(line_text[: max(0, codepoint_col)]) + + +def _ast_position_to_lsp( + source_lines: Sequence[str], + line: int, + byte_col: int, +) -> int: + if line < 0 or line >= len(source_lines): + return 0 + return _byte_col_to_utf16(source_lines[line], byte_col) + + +def _utf16_col_to_byte(line_text: str, utf16_col: int) -> int: + target = max(0, utf16_col) + units = 0 + byte_count = 0 + for char in line_text: + char_units = _utf16_units(char) + if units + char_units > target: + break + units += char_units + byte_count += len(char.encode("utf-8")) + if units == target: + break + return byte_count + + def _position_to_byte(index: _FunctionIndex, line: int, character: int) -> int: if line >= len(index.line_starts): return len(index.source.encode("utf-8")) line_start = index.line_starts[line] line_text = index.source.splitlines(keepends=True)[line] if index.source else "" - return line_start + len(line_text[:character].encode("utf-8")) + return line_start + _utf16_col_to_byte(line_text, character) def _range_key(raw_range: object) -> tuple[int, int, int, int] | None: @@ -1360,6 +1513,18 @@ def _range_key(raw_range: object) -> tuple[int, int, int, int] | None: return (start_line, start_character, end_line, end_character) +def _range_within_function( + range_key: tuple[int, int, int, int], + function: _FunctionInfo, +) -> bool: + start_line, start_character, end_line, end_character = range_key + if start_line < function.line or end_line > function.end_line: + return False + if start_line == function.line and start_character < function.character: + return False + return not (end_line == function.end_line and end_character > function.end_character) + + def _range_start_key(raw_range: dict[object, object]) -> tuple[int, int] | None: start = raw_range.get("start") if not isinstance(start, dict): diff --git a/plugins/python/src/clarion_plugin_python/server.py b/plugins/python/src/clarion_plugin_python/server.py index b24d4f4c..811e2c8e 100644 --- a/plugins/python/src/clarion_plugin_python/server.py +++ b/plugins/python/src/clarion_plugin_python/server.py @@ -13,9 +13,9 @@ - ``shutdown`` → ``{}`` (empty ``ShutdownResult`` struct — *not* ``null``). - ``initialized`` / ``exit`` — notifications, no response. -Task 2 ships the dispatch skeleton with ``analyze_file`` returning an empty -entity list. Task 6 wires the Wardline probe result into ``capabilities``; -Task 7 replaces ``handle_analyze_file`` with the extractor. +Task 2 shipped the dispatch skeleton with ``analyze_file`` returning an empty +entity list. The current plugin does not advertise Wardline capabilities until +it emits real Wardline-derived semantic signals. """ from __future__ import annotations @@ -31,17 +31,9 @@ from clarion_plugin_python.extractor import extract_with_stats from clarion_plugin_python.pyright_session import PyrightRunState, PyrightSession from clarion_plugin_python.stdout_guard import install_stdio -from clarion_plugin_python.wardline_probe import probe as wardline_probe ONTOLOGY_VERSION = "0.6.0" -# Sprint 1 defaults for the Wardline version pin (WP3 L8 + plugin.toml -# `[integrations.wardline]`). Kept as module constants so Task 7's -# manifest values match by inspection; a future sprint can flow these -# through from the parsed manifest on demand. -WARDLINE_MIN_VERSION = "1.0.0" -WARDLINE_MAX_VERSION = "2.0.0" - # Plugin-side Content-Length sanity cap. Matches the host's ADR-021 §2b # default (8 MiB) so the plugin never emits a frame the host would kill us # for. Oversize outbound payloads trip this before reaching the wire. @@ -80,7 +72,11 @@ def read_frame(stream: IO[bytes]) -> dict[str, Any] | None: return None if line in (b"\r\n", b"\n"): break - decoded = line.decode("ascii").rstrip("\r\n") + try: + decoded = line.decode("ascii").rstrip("\r\n") + except UnicodeDecodeError as exc: + msg = "malformed non-ASCII header line" + raise ProtocolError(msg) from exc if ":" not in decoded: msg = f"malformed header line: {decoded!r}" raise ProtocolError(msg) @@ -150,9 +146,7 @@ def handle_initialize(params: dict[str, Any], state: ServerState) -> dict[str, A "name": "clarion-plugin-python", "version": __version__, "ontology_version": ONTOLOGY_VERSION, - "capabilities": { - "wardline": wardline_probe(WARDLINE_MIN_VERSION, WARDLINE_MAX_VERSION), - }, + "capabilities": {}, } diff --git a/plugins/python/src/clarion_plugin_python/wardline_probe.py b/plugins/python/src/clarion_plugin_python/wardline_probe.py index ea4fa5ba..85e8a266 100644 --- a/plugins/python/src/clarion_plugin_python/wardline_probe.py +++ b/plugins/python/src/clarion_plugin_python/wardline_probe.py @@ -25,6 +25,9 @@ from __future__ import annotations import importlib +import importlib.metadata +import importlib.util +from pathlib import Path from typing import Any from packaging.version import InvalidVersion, Version @@ -35,22 +38,50 @@ def probe(min_version: str, max_version: str) -> dict[str, Any]: """Probe the Wardline package for presence and version compatibility.""" try: - importlib.import_module("wardline.core.registry") - wardline = importlib.import_module("wardline") - except ImportError: - return _ABSENT + # Locate the wardline package + spec = importlib.util.find_spec("wardline") + if spec is None or not spec.submodule_search_locations: + return _ABSENT - raw_version = getattr(wardline, "__version__", None) - if not isinstance(raw_version, str): - return _ABSENT + # Check if core/vocabulary.yaml exists and is readable + package_dir = Path(spec.submodule_search_locations[0]) + vocab_path = package_dir / "core" / "vocabulary.yaml" + if not vocab_path.is_file(): + return _ABSENT - try: - version = Version(raw_version) - low = Version(min_version) - high = Version(max_version) - except InvalidVersion: - return _ABSENT + # Verify it can be loaded as valid YAML + try: + import yaml # type: ignore[import-untyped] # noqa: PLC0415 + + with vocab_path.open(encoding="utf-8") as f: + yaml.safe_load(f) + except (ImportError, OSError, yaml.YAMLError): + return _ABSENT - if low <= version < high: - return {"status": "enabled", "version": raw_version} - return {"status": "version_out_of_range", "version": raw_version} + # Extract the version + raw_version = None + try: + raw_version = importlib.metadata.version("wardline") + except importlib.metadata.PackageNotFoundError: + try: + wardline = importlib.import_module("wardline") + raw_version = getattr(wardline, "__version__", None) + except ImportError: + return _ABSENT + + if not isinstance(raw_version, str): + return _ABSENT + + try: + version = Version(raw_version) + low = Version(min_version) + high = Version(max_version) + except InvalidVersion: + return _ABSENT + + if low <= version < high: + return {"status": "enabled", "version": raw_version} + return {"status": "version_out_of_range", "version": raw_version} # noqa: TRY300 + + except Exception: # noqa: BLE001 + return _ABSENT diff --git a/plugins/python/tests/test_extractor.py b/plugins/python/tests/test_extractor.py index c326d8e6..6dd646b6 100644 --- a/plugins/python/tests/test_extractor.py +++ b/plugins/python/tests/test_extractor.py @@ -2,6 +2,7 @@ from __future__ import annotations +import json import shutil import sys import textwrap @@ -25,6 +26,11 @@ if TYPE_CHECKING: from collections.abc import Sequence +WARDLINE_QUALNAME_FIXTURE = ( + Path(__file__).resolve().parents[3] + / "docs/federation/fixtures/wardline-qualname-normalization.json" +) + class FakeCallResolver: def resolve_calls( @@ -464,6 +470,32 @@ def test_level_three_relative_import_from_deep_package_init_targets_root_sibling ] +def test_type_checking_and_function_local_imports_carry_runtime_scope() -> None: + source = ( + "from typing import TYPE_CHECKING\n" + "if TYPE_CHECKING:\n" + " import pkg.types\n" + "\n" + "def load():\n" + " import pkg.local\n" + ) + _entities, edges = extract(source, "consumer.py") + imports_by_target = {edge["to_id"]: edge for edge in _import_edges(edges)} + + assert imports_by_target["python:module:pkg.types"]["properties"] == { + "imported_name": "pkg.types", + "import_style": "import", + "level": 0, + "type_only": True, + } + assert imports_by_target["python:module:pkg.local"]["properties"] == { + "imported_name": "pkg.local", + "import_style": "import", + "level": 0, + "scope": "function", + } + + def test_import_edges_have_source_byte_range_and_resolved_confidence() -> None: source = "é = 1\nimport pkg.service\n" _entities, edges = extract(source, "consumer.py") @@ -529,6 +561,36 @@ def caller(): assert calls[0]["source_byte_start"] < calls[0]["source_byte_end"] +@pytest.mark.pyright +def test_extractor_skips_calls_from_dropped_duplicate_definition( + tmp_path: Path, + pyright_langserver: str, +) -> None: + result = _extract_with_pyright( + tmp_path, + """ + def callee(): + pass + + def dup(): + pass + + def dup(): + callee() + """, + pyright_langserver, + ) + + calls = _call_edges(result.edges) + assert result.stats.duplicate_entities_dropped_total == 1 + assert [ + edge + for edge in calls + if edge["from_id"] == "python:function:demo.dup" + and edge["to_id"] == "python:function:demo.callee" + ] == [] + + @pytest.mark.pyright def test_extractor_emits_ambiguous_calls_with_candidates( tmp_path: Path, @@ -767,11 +829,27 @@ def test_module_prefix_path_decouples_file_path_and_dotted_prefix() -> None: def test_module_dotted_name_helper() -> None: - assert module_dotted_name("demo.py") == "demo" - assert module_dotted_name("src/demo.py") == "demo" - assert module_dotted_name("pkg/__init__.py") == "pkg" - assert module_dotted_name("src/pkg/mod.py") == "pkg.mod" - assert module_dotted_name("src/pkg/sub/mod.py") == "pkg.sub.mod" + fixture = json.loads(WARDLINE_QUALNAME_FIXTURE.read_text(encoding="utf-8")) + vectors = fixture["module_normalization_vectors"] + assert vectors + for vector in vectors: + assert module_dotted_name(vector["file_path"]) == vector["expected_module"], vector[ + "description" + ] + + +def test_wardline_qualname_fixture_composes_with_module_dotted_name() -> None: + fixture = json.loads(WARDLINE_QUALNAME_FIXTURE.read_text(encoding="utf-8")) + vectors = fixture["qualified_name_vectors"] + assert vectors + for vector in vectors: + module_name = module_dotted_name(vector["file_path"]) + qualname = vector["qualname"] + qualified_name = module_name if qualname is None else f"{module_name}.{qualname}" + assert qualified_name == vector["expected_qualified_name"], vector["description"] + assert f"python:{vector['kind']}:{qualified_name}" == vector["expected_entity_id"], vector[ + "description" + ] def test_source_range_end_fields_populated() -> None: @@ -839,6 +917,45 @@ def test_definition_metadata_multiple_decorators_uses_topmost() -> None: assert definition["decl_line"] == 5 +def test_categorisation_tags_and_docstrings_are_emitted() -> None: + """WS5b root categorisations are emitted by the Python plugin, not fabricated by MCP.""" + + source = """\ +import dataclasses +from fastapi import APIRouter + +router = APIRouter() + +@router.get("/health") +def health(): + \"\"\"Report service health.\"\"\" + return {"ok": True} + +def main(): + return health() + +def test_health(): + assert health()["ok"] + +@dataclasses.dataclass +class Config: + retries: int = 3 +""" + entities, _ = extract(source, "service.py") + + health = next(e for e in entities if e["id"] == "python:function:service.health") + main = next(e for e in entities if e["id"] == "python:function:service.main") + test = next(e for e in entities if e["id"] == "python:function:service.test_health") + config = next(e for e in entities if e["id"] == "python:class:service.Config") + + assert health["docstring"] == "Report service health." + assert "http-route" in health["tags"] + assert "framework-handler" in health["tags"] + assert "entry-point" in main["tags"] + assert "test" in test["tags"] + assert "data-model" in config["tags"] + + def test_module_source_range_no_trailing_newline() -> None: """File ending without `\\n` still produces correct end_line. @@ -1255,6 +1372,41 @@ def test_references_inside_overload_implementation_body_are_emitted() -> None: assert impl_sites[0].kind == "name" +@pytest.mark.pyright +def test_extractor_does_not_turn_closure_local_into_module_reference( + tmp_path: Path, + pyright_langserver: str, +) -> None: + source = textwrap.dedent( + """ + def outer(): + token = object() + + def inner(): + return token + + return inner + """, + ).lstrip() + path = tmp_path / "demo.py" + path.write_text(source, encoding="utf-8") + + with PyrightSession(tmp_path, executable=pyright_langserver) as resolver: + result = extract_with_stats( + source, + str(path), + module_prefix_path="demo.py", + reference_resolver=resolver, + ) + + references = [edge for edge in result.edges if edge["kind"] == "references"] + assert not any( + edge["from_id"] == "python:function:demo.outer..inner" + and edge["to_id"] == "python:module:demo" + for edge in references + ) + + def test_safety_net_drops_duplicate_non_overload_definitions( capsys: pytest.CaptureFixture[str], ) -> None: diff --git a/plugins/python/tests/test_package.py b/plugins/python/tests/test_package.py index 85c15752..95b1e8bd 100644 --- a/plugins/python/tests/test_package.py +++ b/plugins/python/tests/test_package.py @@ -16,7 +16,7 @@ def _read_toml(path: Path) -> dict[str, Any]: def test_package_version_matches_pyproject() -> None: - assert clarion_plugin_python.__version__ == "1.1.0" + assert clarion_plugin_python.__version__ == "1.2.0" def test_plugin_version_lockstep_across_pyproject_manifest_and_module() -> None: @@ -38,9 +38,15 @@ def test_plugin_version_lockstep_across_pyproject_manifest_and_module() -> None: ) -def test_manifest_declares_references_edge_kind() -> None: +def test_manifest_declares_current_v1_ontology_only() -> None: manifest = _read_toml(_PLUGIN_ROOT / "plugin.toml") - assert manifest["plugin"]["version"] == "1.1.0" + assert manifest["plugin"]["version"] == "1.2.0" + assert manifest["capabilities"]["runtime"]["wardline_aware"] is False assert manifest["ontology"]["ontology_version"] == "0.6.0" + assert manifest["ontology"]["entity_kinds"] == ["function", "class", "module"] assert manifest["ontology"]["edge_kinds"] == ["contains", "calls", "references", "imports"] + assert "decorated_by" not in manifest["ontology"]["edge_kinds"] + assert "inherits_from" not in manifest["ontology"]["edge_kinds"] + assert "uses_type" not in manifest["ontology"]["edge_kinds"] + assert "alias_of" not in manifest["ontology"]["edge_kinds"] diff --git a/plugins/python/tests/test_pyright_session.py b/plugins/python/tests/test_pyright_session.py index 61787a9c..f63ac0df 100644 --- a/plugins/python/tests/test_pyright_session.py +++ b/plugins/python/tests/test_pyright_session.py @@ -176,6 +176,37 @@ def caller(): assert result.unresolved_call_sites_total == 0 +@pytest.mark.pyright +def test_pyright_session_call_range_uses_utf16_lsp_positions_but_emits_bytes( + tmp_path: Path, + pyright_langserver: str, +) -> None: + source = textwrap.dedent( + """ + def callee(): + pass + + def caller(): + marker = "🐍"; callee() + """, + ).lstrip() + module = _write_module(tmp_path, source) + + with PyrightSession(tmp_path, executable=pyright_langserver) as session: + result = session.resolve_calls( + module, + ["python:function:demo.caller", "python:function:demo.callee"], + ) + + assert len(result.edges) == 1 + edge = result.edges[0] + assert edge["source_byte_start"] == source.encode().find( + b"callee", source.encode().find(b"marker") + ) + assert edge["source_byte_end"] == edge["source_byte_start"] + len(b"callee") + assert source.encode()[edge["source_byte_start"] : edge["source_byte_end"]] == b"callee" + + @pytest.mark.pyright def test_pyright_session_emits_unresolved_call_site_details( tmp_path: Path, diff --git a/plugins/python/tests/test_server.py b/plugins/python/tests/test_server.py index 36c73e32..65cf05e6 100644 --- a/plugins/python/tests/test_server.py +++ b/plugins/python/tests/test_server.py @@ -86,18 +86,11 @@ def test_initialize_roundtrip() -> None: assert response["id"] == 1 result = response["result"] assert result["name"] == "clarion-plugin-python" - assert result["version"] == "1.1.0" + assert result["version"] == "1.2.0" assert result["ontology_version"] == "0.6.0" - # Capabilities carry the L8 Wardline probe result. We don't pin a - # specific status here because the probe's output depends on whether - # wardline is installed in the test environment — all three legal - # states (`absent`, `enabled`, `version_out_of_range`) pass. - assert "wardline" in result["capabilities"] - assert result["capabilities"]["wardline"]["status"] in { - "absent", - "enabled", - "version_out_of_range", - } + # Wardline is not advertised until the plugin emits real Wardline + # semantic signals, not just package/version probe metadata. + assert result["capabilities"] == {} # Graceful shutdown: shutdown → ack `{}`, then exit notification. proc.stdin.write( @@ -157,6 +150,22 @@ def test_analyze_file_before_initialized_returns_error() -> None: proc.wait(timeout=2) +def test_malformed_non_ascii_header_uses_protocol_error_exit_path() -> None: + """Malformed header bytes exit cleanly without emitting framed stdout.""" + proc = subprocess.Popen( # noqa: S603 + _SERVER_CMD, + stdin=subprocess.PIPE, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + ) + + stdout, stderr = proc.communicate(b"Content-L\xe9ngth: 0\r\n\r\n", timeout=5) + + assert proc.returncode == 1 + assert stdout == b"" + assert b"Traceback" not in stderr + + def test_analyze_file_returns_extracted_entities(tmp_path: Path) -> None: """After initialize, analyze_file on a real .py file yields function entities.""" demo = tmp_path / "demo.py" diff --git a/plugins/python/tests/test_wardline_probe.py b/plugins/python/tests/test_wardline_probe.py index c0609d09..f46faaf0 100644 --- a/plugins/python/tests/test_wardline_probe.py +++ b/plugins/python/tests/test_wardline_probe.py @@ -1,15 +1,17 @@ """Unit tests for the L8 Wardline probe (WP3 Task 6). -Each case stubs ``importlib.import_module`` inside the probe module to -simulate the absent / in-range / out-of-range states without requiring +Each case stubs importlib and Path operations to simulate +the absent / in-range / out-of-range states without requiring the real ``wardline`` package to be present or absent in the test environment. """ from __future__ import annotations -from types import SimpleNamespace -from typing import TYPE_CHECKING +import importlib.metadata +import io +from pathlib import Path +from typing import TYPE_CHECKING, Any from clarion_plugin_python.wardline_probe import probe @@ -17,115 +19,192 @@ import pytest -def _install_fake_import( +def _install_fake_probe_env( # noqa: PLR0913 monkeypatch: pytest.MonkeyPatch, *, - wardline_module: object | None, - registry_module: object | None, + installed: bool, + vocab_exists: bool, + vocab_valid_yaml: bool, + version_via_metadata: str | None = None, + version_via_import: Any = None, ) -> None: - """Replace ``importlib.import_module`` as seen by wardline_probe.""" + """Mock the importlib, Path, open, and yaml behaviors.""" + # 1. Mock find_spec + def fake_find_spec(name: str) -> object | None: + if name == "wardline" and installed: + + class FakeSpec: + submodule_search_locations = ["/fake/wardline"] # noqa: RUF012 + + return FakeSpec() + return None + + monkeypatch.setattr( + "clarion_plugin_python.wardline_probe.importlib.util.find_spec", + fake_find_spec, + ) + + # 2. Mock Path.is_file + orig_is_file = Path.is_file + + def fake_is_file(self: Path) -> bool: + if str(self) == "/fake/wardline/core/vocabulary.yaml": + return vocab_exists + return orig_is_file(self) + + monkeypatch.setattr("clarion_plugin_python.wardline_probe.Path.is_file", fake_is_file) + + # 3. Mock Path.open and yaml.safe_load + orig_open = Path.open + + def fake_open(self: Path, *args: Any, **kwargs: Any) -> Any: + if str(self) == "/fake/wardline/core/vocabulary.yaml": + if vocab_valid_yaml: + return io.StringIO("version: 1.0.0\nentries: []") + return io.StringIO("unbalanced: [") + return orig_open(self, *args, **kwargs) + + monkeypatch.setattr("clarion_plugin_python.wardline_probe.Path.open", fake_open) + + # 4. Mock importlib.metadata.version + def fake_version(name: str) -> str: + if name == "wardline" and version_via_metadata is not None: + return version_via_metadata + raise importlib.metadata.PackageNotFoundError + + monkeypatch.setattr( + "clarion_plugin_python.wardline_probe.importlib.metadata.version", + fake_version, + ) + + # 5. Mock importlib.import_module def fake_import(name: str) -> object: - if name == "wardline.core.registry": - if registry_module is None: - msg = "no wardline.core.registry" - raise ImportError(msg) - return registry_module if name == "wardline": - if wardline_module is None: + if version_via_import is None: msg = "no wardline" raise ImportError(msg) - return wardline_module + + class FakeWardline: + __version__ = version_via_import + + return FakeWardline() msg = f"unexpected import: {name}" raise ImportError(msg) - # String target bypasses mypy's re-export check on - # `clarion_plugin_python.wardline_probe.importlib`. monkeypatch.setattr( "clarion_plugin_python.wardline_probe.importlib.import_module", fake_import, ) -def test_probe_absent_when_registry_import_fails(monkeypatch: pytest.MonkeyPatch) -> None: - _install_fake_import(monkeypatch, wardline_module=None, registry_module=None) +def test_probe_absent_when_not_installed(monkeypatch: pytest.MonkeyPatch) -> None: + _install_fake_probe_env( + monkeypatch, installed=False, vocab_exists=False, vocab_valid_yaml=False + ) + assert probe("0.1.0", "0.2.0") == {"status": "absent"} + + +def test_probe_absent_when_vocab_missing(monkeypatch: pytest.MonkeyPatch) -> None: + _install_fake_probe_env(monkeypatch, installed=True, vocab_exists=False, vocab_valid_yaml=False) + assert probe("0.1.0", "0.2.0") == {"status": "absent"} + + +def test_probe_absent_when_vocab_invalid(monkeypatch: pytest.MonkeyPatch) -> None: + _install_fake_probe_env(monkeypatch, installed=True, vocab_exists=True, vocab_valid_yaml=False) assert probe("0.1.0", "0.2.0") == {"status": "absent"} -def test_probe_enabled_when_version_in_range(monkeypatch: pytest.MonkeyPatch) -> None: - fake_wardline = SimpleNamespace(__version__="0.1.5") - fake_registry = SimpleNamespace(REGISTRY={}) - _install_fake_import( +def test_probe_enabled_when_version_in_range_metadata(monkeypatch: pytest.MonkeyPatch) -> None: + _install_fake_probe_env( + monkeypatch, + installed=True, + vocab_exists=True, + vocab_valid_yaml=True, + version_via_metadata="0.1.5", + ) + assert probe("0.1.0", "0.2.0") == {"status": "enabled", "version": "0.1.5"} + + +def test_probe_enabled_when_version_in_range_import_fallback( + monkeypatch: pytest.MonkeyPatch, +) -> None: + _install_fake_probe_env( monkeypatch, - wardline_module=fake_wardline, - registry_module=fake_registry, + installed=True, + vocab_exists=True, + vocab_valid_yaml=True, + version_via_metadata=None, + version_via_import="0.1.5", ) assert probe("0.1.0", "0.2.0") == {"status": "enabled", "version": "0.1.5"} def test_probe_at_lower_bound_is_enabled(monkeypatch: pytest.MonkeyPatch) -> None: """Lower bound is inclusive.""" - fake_wardline = SimpleNamespace(__version__="0.1.0") - fake_registry = SimpleNamespace(REGISTRY={}) - _install_fake_import( + _install_fake_probe_env( monkeypatch, - wardline_module=fake_wardline, - registry_module=fake_registry, + installed=True, + vocab_exists=True, + vocab_valid_yaml=True, + version_via_metadata="0.1.0", ) assert probe("0.1.0", "0.2.0") == {"status": "enabled", "version": "0.1.0"} def test_probe_at_upper_bound_is_out_of_range(monkeypatch: pytest.MonkeyPatch) -> None: """Upper bound is exclusive.""" - fake_wardline = SimpleNamespace(__version__="0.2.0") - fake_registry = SimpleNamespace(REGISTRY={}) - _install_fake_import( + _install_fake_probe_env( monkeypatch, - wardline_module=fake_wardline, - registry_module=fake_registry, + installed=True, + vocab_exists=True, + vocab_valid_yaml=True, + version_via_metadata="0.2.0", ) assert probe("0.1.0", "0.2.0") == {"status": "version_out_of_range", "version": "0.2.0"} def test_probe_above_upper_bound_is_out_of_range(monkeypatch: pytest.MonkeyPatch) -> None: - fake_wardline = SimpleNamespace(__version__="0.3.0") - fake_registry = SimpleNamespace(REGISTRY={}) - _install_fake_import( + _install_fake_probe_env( monkeypatch, - wardline_module=fake_wardline, - registry_module=fake_registry, + installed=True, + vocab_exists=True, + vocab_valid_yaml=True, + version_via_metadata="0.3.0", ) assert probe("0.1.0", "0.2.0") == {"status": "version_out_of_range", "version": "0.3.0"} def test_probe_absent_when_version_attribute_missing(monkeypatch: pytest.MonkeyPatch) -> None: - fake_wardline = SimpleNamespace() # no __version__ - fake_registry = SimpleNamespace(REGISTRY={}) - _install_fake_import( + _install_fake_probe_env( monkeypatch, - wardline_module=fake_wardline, - registry_module=fake_registry, + installed=True, + vocab_exists=True, + vocab_valid_yaml=True, + version_via_metadata=None, + version_via_import=None, ) assert probe("0.1.0", "0.2.0") == {"status": "absent"} def test_probe_absent_when_version_is_not_a_string(monkeypatch: pytest.MonkeyPatch) -> None: - fake_wardline = SimpleNamespace(__version__=123) - fake_registry = SimpleNamespace(REGISTRY={}) - _install_fake_import( + _install_fake_probe_env( monkeypatch, - wardline_module=fake_wardline, - registry_module=fake_registry, + installed=True, + vocab_exists=True, + vocab_valid_yaml=True, + version_via_metadata=None, + version_via_import=123, ) assert probe("0.1.0", "0.2.0") == {"status": "absent"} def test_probe_absent_when_version_is_not_valid_semver(monkeypatch: pytest.MonkeyPatch) -> None: - fake_wardline = SimpleNamespace(__version__="not-a-version") - fake_registry = SimpleNamespace(REGISTRY={}) - _install_fake_import( + _install_fake_probe_env( monkeypatch, - wardline_module=fake_wardline, - registry_module=fake_registry, + installed=True, + vocab_exists=True, + vocab_valid_yaml=True, + version_via_metadata="not-a-version", ) assert probe("0.1.0", "0.2.0") == {"status": "absent"} diff --git a/plugins/python/uv.lock b/plugins/python/uv.lock new file mode 100644 index 00000000..5b36c568 --- /dev/null +++ b/plugins/python/uv.lock @@ -0,0 +1,652 @@ +version = 1 +revision = 3 +requires-python = ">=3.11" +resolution-markers = [ + "python_full_version >= '3.15'", + "python_full_version < '3.15'", +] + +[[package]] +name = "ast-serialize" +version = "0.5.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/81/9d/09e27731bd5864a9ce04e3244074e674bb8936bf62b45e0357248717adac/ast_serialize-0.5.0.tar.gz", hash = "sha256:5880091bfe6f4f986f22866375c2e884843e7a0b6343ae41aeea659613d879b6", size = 61157, upload-time = "2026-05-17T17:48:29.429Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c0/9a/13dde51ba9e15f8b97957ab7cb0120d0e381524d651c6bd630b9c359227f/ast_serialize-0.5.0-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:8f5c14f169eb0972c0c21bada5358b23d6047c76583b005234f865b11f1fa00a", size = 1183520, upload-time = "2026-05-17T17:47:30.831Z" }, + { url = "https://files.pythonhosted.org/packages/37/de/5a7f0a9fe68944f536632a5af84676739c7d2582be42deb082634bf3a754/ast_serialize-0.5.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:7d1a2de9de5be04652f0ed60738356ef94f66db37924a9499fffe98dc491aa0b", size = 1175779, upload-time = "2026-05-17T17:47:32.551Z" }, + { url = "https://files.pythonhosted.org/packages/9c/81/0bb853e76e4f6e9a1855d569003c59e19ffac45f7079d91505d1bb212f92/ast_serialize-0.5.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:be5173fb66f9b49026d9d5a2ff0fc7c7009077107c0eb285b2d60fdf1fe10bd1", size = 1233750, upload-time = "2026-05-17T17:47:34.731Z" }, + { url = "https://files.pythonhosted.org/packages/e5/d3/4cf705beeccc08754d0bbda99aefff26110e209b9a07ac8a6b60eec48531/ast_serialize-0.5.0-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f8015cd071ac1339924ee2b8098c93e00e155f30a16f40ec9816fcf84f4753f6", size = 1235942, upload-time = "2026-05-17T17:47:36.287Z" }, + { url = "https://files.pythonhosted.org/packages/26/c8/ee097e437ea27dd2b8b227865c875492b585650a5802a22d82b304c8201b/ast_serialize-0.5.0-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:5499e8797edff2a9186aa313ed382c6b422e798e9332d9953badcee6e69a88f2", size = 1442517, upload-time = "2026-05-17T17:47:38.17Z" }, + { url = "https://files.pythonhosted.org/packages/ff/bd/68063442838f1ba68ec72b5436430bc75b3bb17a1a3c3063f09b0c05ae2b/ast_serialize-0.5.0-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:6848f2a093fb5548751a9a09bff8fcd229e2bbeb0e3331f391b6ae6d26cd9903", size = 1254081, upload-time = "2026-05-17T17:47:39.826Z" }, + { url = "https://files.pythonhosted.org/packages/50/e2/1e520793bc6a4e4524a6ab022391e827825eaa0c3811828bfdc6852eca26/ast_serialize-0.5.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:832d4c998e0b091fd60a6d6bceee535483c4d490de9ba85003af835225719261", size = 1259910, upload-time = "2026-05-17T17:47:41.369Z" }, + { url = "https://files.pythonhosted.org/packages/4e/e1/49b60f467979979cfe6913b43948ff25bca971ad0591d181812f163a988e/ast_serialize-0.5.0-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:16db7c62ec0b8efe1d7afd283a388d8f74f2605d56032e5a37747d2de8dba027", size = 1250678, upload-time = "2026-05-17T17:47:43.702Z" }, + { url = "https://files.pythonhosted.org/packages/74/ba/66ab9555de6275677566f6574e5ef6c29cb185ea866f643bc06f8280a8ee/ast_serialize-0.5.0-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:baf5eb061eb5bccade4128ad42da33787d72f6013809cd1b590376ece8b3c937", size = 1301603, upload-time = "2026-05-17T17:47:46.256Z" }, + { url = "https://files.pythonhosted.org/packages/66/42/6aca9b9abc710014b2be9059689e5dd1679339e78f567ffb4d255a9e2050/ast_serialize-0.5.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:104e4a35bd7c124173c41760ef9aaea17ddb3f86c65cb643671d59afbe3ee94c", size = 1410332, upload-time = "2026-05-17T17:47:47.899Z" }, + { url = "https://files.pythonhosted.org/packages/47/68/2f76594432a22581ecf878b5e75a9b8601c24b2241cf0bbeb1e21fcf370c/ast_serialize-0.5.0-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:36be371028fc1675acb38a331bde160dbab7ff907fdf00b67eb6911aa106951b", size = 1509979, upload-time = "2026-05-17T17:47:50.942Z" }, + { url = "https://files.pythonhosted.org/packages/40/ac/a93c9b58292653f6c595752f677a08e608f903b710594909e9231a389b3b/ast_serialize-0.5.0-cp314-cp314t-musllinux_1_2_i686.whl", hash = "sha256:061ee58bdb52341c8201a6df41182a977736bae3b7ded87ca7176ca25a8a47ab", size = 1505002, upload-time = "2026-05-17T17:47:54.093Z" }, + { url = "https://files.pythonhosted.org/packages/14/2e/b278f68c497ee2f1d1576cbbef8db5281cd4a5f2db040537592ac9c8862e/ast_serialize-0.5.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:b15219e9cdc9f53f6f4cb51c009203507228226148c05c5e8fe451c28b435eb3", size = 1456231, upload-time = "2026-05-17T17:47:56.311Z" }, + { url = "https://files.pythonhosted.org/packages/0b/43/419be1c566a4c504cd8fd60ce2f84e790f295495c0f327cfaeadf3d51012/ast_serialize-0.5.0-cp314-cp314t-win32.whl", hash = "sha256:842d1c004bb466c7df036f95fabef789570541922b10976b12f5592a69cf0b38", size = 1058668, upload-time = "2026-05-17T17:47:58.305Z" }, + { url = "https://files.pythonhosted.org/packages/03/6f/c9d4d549295ed05111aeb8853232d1afd9d0a179fddb01eeffbb3a4a6842/ast_serialize-0.5.0-cp314-cp314t-win_amd64.whl", hash = "sha256:b0c06d760909b095cc466356dfccd05a1c7233a6ca191c020dca2c6a6f16c24c", size = 1101075, upload-time = "2026-05-17T17:48:00.35Z" }, + { url = "https://files.pythonhosted.org/packages/d0/8e/d00c5ab30c58222e07d62956fca86c59d91b9ad32997e633c38b526623a3/ast_serialize-0.5.0-cp314-cp314t-win_arm64.whl", hash = "sha256:787baedb0262cc49e8ce37cc15c00ae818e46a165a3b36f5e21ed174998104cb", size = 1075347, upload-time = "2026-05-17T17:48:01.753Z" }, + { url = "https://files.pythonhosted.org/packages/e0/9e/dc2530acb3a60dc6e46d65abf27d1d9f86721694757906a148d90a6860de/ast_serialize-0.5.0-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:0668aa9459cfa8c9c49ddd2163ebcf43088ba045ef7492af6fe22e0098303101", size = 1191380, upload-time = "2026-05-17T17:48:03.738Z" }, + { url = "https://files.pythonhosted.org/packages/26/0a/bd3d18a582f273d6c843d16bb9e22e9e16365ff7991e92f18f798e9f1224/ast_serialize-0.5.0-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:bf683d6363edf2b39eed6b6d4fe22d34b6203867a67e27134d9e2a2680c4bc4a", size = 1183879, upload-time = "2026-05-17T17:48:05.463Z" }, + { url = "https://files.pythonhosted.org/packages/40/ae/1f919100f8620887af58fcc381c61a1f218cdf89c6e155f87b213e61010a/ast_serialize-0.5.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9cc22cf0c9be65e71cf88fda130af60d61eb4a79370ad4cfe7900d48a4aa2211", size = 1244529, upload-time = "2026-05-17T17:48:07.008Z" }, + { url = "https://files.pythonhosted.org/packages/c6/ca/6376559dcce707cdbc1d0d9a13c8d3baaaa501e949ce0ebdc4230cd881aa/ast_serialize-0.5.0-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f66173891548c9f2726bf27957b41cabce12fa679dc6da505ddbde4d4b3b31cf", size = 1240560, upload-time = "2026-05-17T17:48:08.46Z" }, + { url = "https://files.pythonhosted.org/packages/35/b2/a620e206b5aeb7efbf2710336df57d457cffbb3991076bbcc1147ef9abd4/ast_serialize-0.5.0-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:e42d729ef2be96a14efbad355093284739e3670ece3e534f82cc8832790911d9", size = 1451172, upload-time = "2026-05-17T17:48:09.922Z" }, + { url = "https://files.pythonhosted.org/packages/fa/e0/4ad5c04c24a40481b2935ce9a0ccdb6023dc8b667167d06ae530cc3512f2/ast_serialize-0.5.0-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:b725026bafa801dbd7310eb13a75f0a2e370e7e51b2cb225f9d21fcfadf919ee", size = 1265072, upload-time = "2026-05-17T17:48:11.469Z" }, + { url = "https://files.pythonhosted.org/packages/b2/71/4d1d479aa56d0101c40e17720c3d6ac2af7269ea0487a80b18e7bfd1a5b7/ast_serialize-0.5.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b54f60c1d78767a53b67eaa663f0dfac3afe606aa07f1301572f588b73d64809", size = 1270488, upload-time = "2026-05-17T17:48:13.575Z" }, + { url = "https://files.pythonhosted.org/packages/6d/4f/0de1bbe06f6edef9fde4ed12ca8e7b3ec7e6e2bd4e672c5af487f7957665/ast_serialize-0.5.0-cp39-abi3-manylinux_2_31_riscv64.whl", hash = "sha256:27d51654fc240a1e87e742d353d98eb45b75f62f129086b3596ab53df2ac2a43", size = 1260702, upload-time = "2026-05-17T17:48:15.141Z" }, + { url = "https://files.pythonhosted.org/packages/75/61/e00872439cfdddcc3c1b6cdaa6e5d904ba8e26a18807c67c4e14409d0ca8/ast_serialize-0.5.0-cp39-abi3-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2782c36237c46dd1674542f2109740ea5ea485a169bf1431939ada0434e17934", size = 1311182, upload-time = "2026-05-17T17:48:16.779Z" }, + { url = "https://files.pythonhosted.org/packages/76/8e/699a5b955f7926956c95e9e1d74132acad73c2fe7a426f94da89123c20aa/ast_serialize-0.5.0-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:1943db345233cc7194a470f13afa9c59772c0b123dea0c9414c4d4ca54369759", size = 1421410, upload-time = "2026-05-17T17:48:18.527Z" }, + { url = "https://files.pythonhosted.org/packages/a9/ae/d5b7626874478997adc7a29ab28accf21e596fb590c944290401dfd0b29e/ast_serialize-0.5.0-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:df1c00022cbbcb064bfaa505aa9c9295362443ce5dacb459d1331d3da353f887", size = 1516587, upload-time = "2026-05-17T17:48:20.133Z" }, + { url = "https://files.pythonhosted.org/packages/0c/ce/b59e02a82d9c4244d64cde502e0b00e83e38816abe19155ceb5437402c7f/ast_serialize-0.5.0-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:cae65289fc456fde04af979a2be09302ef5d8ab92ef23e596d6746dc267ada27", size = 1515171, upload-time = "2026-05-17T17:48:21.921Z" }, + { url = "https://files.pythonhosted.org/packages/8b/38/d8d90042747d05aa08d4efcf1c99035a5f670a6bf4c214d31644392afbca/ast_serialize-0.5.0-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:239a4c354e8d676e9d94631d1d4a64edc6b266f86ff3a5a80aedd344f342c01d", size = 1464668, upload-time = "2026-05-17T17:48:23.544Z" }, + { url = "https://files.pythonhosted.org/packages/dd/51/5b840c4df7334104cecffa28f23904fe81ca89ca223d2450e288de39fd3c/ast_serialize-0.5.0-cp39-abi3-win32.whl", hash = "sha256:143a4ef63285a075871908fda3672dc21864b83a8ec3ee12304aa3e4c5387b9a", size = 1068311, upload-time = "2026-05-17T17:48:25.027Z" }, + { url = "https://files.pythonhosted.org/packages/41/11/ca5672c7d491825bc4cd6702dea106a6b60d928707712ec257c7833ae476/ast_serialize-0.5.0-cp39-abi3-win_amd64.whl", hash = "sha256:cf25572c526add400f26a4750dc6ce0c3bb93fc1f75e7ae0cad4ce4f2cd5c590", size = 1108931, upload-time = "2026-05-17T17:48:26.591Z" }, + { url = "https://files.pythonhosted.org/packages/45/19/cc8bd127d28a43da249aa955cfd164cf8fd534e79e42cea96c4854d72fd0/ast_serialize-0.5.0-cp39-abi3-win_arm64.whl", hash = "sha256:92a31c9c20d25a076edaeec76b128a3535d74a24f340b9a8a7e96c9b86dc9642", size = 1081181, upload-time = "2026-05-17T17:48:28.122Z" }, +] + +[[package]] +name = "cfgv" +version = "3.5.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/4e/b5/721b8799b04bf9afe054a3899c6cf4e880fcf8563cc71c15610242490a0c/cfgv-3.5.0.tar.gz", hash = "sha256:d5b1034354820651caa73ede66a6294d6e95c1b00acc5e9b098e917404669132", size = 7334, upload-time = "2025-11-19T20:55:51.612Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/db/3c/33bac158f8ab7f89b2e59426d5fe2e4f63f7ed25df84c036890172b412b5/cfgv-3.5.0-py2.py3-none-any.whl", hash = "sha256:a8dc6b26ad22ff227d2634a65cb388215ce6cc96bbcc5cfde7641ae87e8dacc0", size = 7445, upload-time = "2025-11-19T20:55:50.744Z" }, +] + +[[package]] +name = "clarion-plugin-python" +version = "1.2.0" +source = { editable = "." } +dependencies = [ + { name = "packaging" }, + { name = "pyright" }, +] + +[package.optional-dependencies] +dev = [ + { name = "mypy" }, + { name = "pre-commit" }, + { name = "pytest" }, + { name = "pytest-cov" }, + { name = "ruff" }, +] + +[package.metadata] +requires-dist = [ + { name = "mypy", marker = "extra == 'dev'", specifier = ">=1.11" }, + { name = "packaging", specifier = ">=24" }, + { name = "pre-commit", marker = "extra == 'dev'", specifier = ">=3.8" }, + { name = "pyright", specifier = "==1.1.409" }, + { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0" }, + { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=5.0" }, + { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.6" }, +] +provides-extras = ["dev"] + +[[package]] +name = "colorama" +version = "0.4.6" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697, upload-time = "2022-10-25T02:36:22.414Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335, upload-time = "2022-10-25T02:36:20.889Z" }, +] + +[[package]] +name = "coverage" +version = "7.14.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/54/fd/0ab2772530e946e1be1abd0bc09e647ec9b02e88f0867857601fefca8953/coverage-7.14.1.tar.gz", hash = "sha256:30c08f7d90415aa98b3c990385dea2939b0da55f38515e5b369b83655f8523be", size = 920132, upload-time = "2026-05-26T20:41:36.783Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/7d/d7/477ad149490e6cb849f28abea1dabb9c823cea72e7500c81b4240ce619c0/coverage-7.14.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:478b5bcd63c2e1357c5c7e16c070690df7b07f676b1c114d7b93e533c664309f", size = 219848, upload-time = "2026-05-26T20:38:38.715Z" }, + { url = "https://files.pythonhosted.org/packages/91/82/a5eb47257c50601bb7b9a9d2857c67b7a3a85ad74180eb2c98bb1fbe0ce5/coverage-7.14.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:a24a81f9715ee42ef59a316cc11611c98fe23920f7c81861315c9f3ff4a230f4", size = 220354, upload-time = "2026-05-26T20:38:40.232Z" }, + { url = "https://files.pythonhosted.org/packages/43/8b/78419b5391a5cb706b6544390507e469d83ffc9a8248b02c4011aceb9365/coverage-7.14.1-cp311-cp311-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:196a13319ad88d6d8ef5ab489ec4f44ddde2143c0c7d5b27786f6c3ffd56a7e1", size = 250771, upload-time = "2026-05-26T20:38:41.782Z" }, + { url = "https://files.pythonhosted.org/packages/77/63/e77aaacd491182210d639636b7a8bba23ffffa9b82aa3762da9431855fa9/coverage-7.14.1-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:3d452fd08b5c72c5167c93e6867b5c08500bd40f2a21e1e854a500550b6cc36f", size = 252683, upload-time = "2026-05-26T20:38:43.305Z" }, + { url = "https://files.pythonhosted.org/packages/65/1c/a022e3cfbec2ac241640003cb3a817e161d9c7f5aa9b49173756cdc03204/coverage-7.14.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:23bf7fa51ac02e07fc7c96849b82946da47ae862dc8f86d183b2a4864fc38129", size = 254791, upload-time = "2026-05-26T20:38:45.361Z" }, + { url = "https://files.pythonhosted.org/packages/61/d6/967e408aca4c1ceb88cb0cc677169110ae7f5995fb5eaf5fb1f5a1bb8f5d/coverage-7.14.1-cp311-cp311-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:bcaa50684dcaadfa599ac48f81103c756d791cfd85c97203d2217c593d48b860", size = 256748, upload-time = "2026-05-26T20:38:46.91Z" }, + { url = "https://files.pythonhosted.org/packages/b8/be/869188f7fe28638078ec479331ace6dc5f7b40b7153eb616f47ab79404d8/coverage-7.14.1-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:4ea1c034f95c9b056e856b794630b17f9fa3d57e4800ff1e503d3be0f9c9078c", size = 250907, upload-time = "2026-05-26T20:38:48.493Z" }, + { url = "https://files.pythonhosted.org/packages/07/aa/adb7d3b4278d690e68703abcd76ab1b948242e3668d921711551b78f9ddb/coverage-7.14.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:c7e057326434e441306226fbeb5d1aaf14a2637efe97ba668306635835f32ad7", size = 252483, upload-time = "2026-05-26T20:38:50.074Z" }, + { url = "https://files.pythonhosted.org/packages/43/61/331c74103c62dcb0c4b9b3a0de9a61aca016208b0a90f109592a9f9ecc28/coverage-7.14.1-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:59baf88468dbc8d63b1887afd92bda52e40bb1561696e5819670601403810cec", size = 250545, upload-time = "2026-05-26T20:38:51.613Z" }, + { url = "https://files.pythonhosted.org/packages/f6/b6/c5dae3c104d89be04828f61810e6b3473825482e4c288cc4ed04553e08ae/coverage-7.14.1-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:d34d75f892b3ab73ba11cab5442cce7b3e168fd64162b16f0e1e0d09c508edef", size = 254310, upload-time = "2026-05-26T20:38:53.503Z" }, + { url = "https://files.pythonhosted.org/packages/ad/a1/2b9d5863e3b83c01ad8199e3c597802fbb3a9dc90b058885804c20296d31/coverage-7.14.1-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:3a56abc20a472baf0304c455721bc601477440d28ecfde8a03dde79ede07e0df", size = 250266, upload-time = "2026-05-26T20:38:55.414Z" }, + { url = "https://files.pythonhosted.org/packages/7f/5e/0e511fbdb269359be26fe678a1c3fa1f2aa2a01573cc3f54268c8d6d4797/coverage-7.14.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:6a3cb83d1552c0cd1b4906655b6a33fd4a8473229633a901c6b73bf86914dee9", size = 251174, upload-time = "2026-05-26T20:38:57.141Z" }, + { url = "https://files.pythonhosted.org/packages/85/10/e55307b622b3dd9671cb321824502dc10f93e72f2802b9946159a8edadeb/coverage-7.14.1-cp311-cp311-win32.whl", hash = "sha256:10274a1fbeb8ec5d72966e17bb198a3104257aca4ac09d98667c5f8aca8c8548", size = 222354, upload-time = "2026-05-26T20:38:58.727Z" }, + { url = "https://files.pythonhosted.org/packages/71/cf/107421693cfb71e4f1ca5bf70443f64d4161878068d07a3e51c7ad21d17b/coverage-7.14.1-cp311-cp311-win_amd64.whl", hash = "sha256:87ebdf787d4888e3f3f2d523eadc6e18c6d18c6d0eb173801a189641627fb37e", size = 223290, upload-time = "2026-05-26T20:39:00.413Z" }, + { url = "https://files.pythonhosted.org/packages/b8/1d/3e3644585eb29e9dafefb19555078529a4d7cce12bd21929664eea989277/coverage-7.14.1-cp311-cp311-win_arm64.whl", hash = "sha256:dd34767fa19848d35659ffc0a75314f58c7af3f1cd87ec521e8292a1238398a3", size = 221953, upload-time = "2026-05-26T20:39:02.159Z" }, + { url = "https://files.pythonhosted.org/packages/3d/b7/bdbb725ba02c5b42825b200c940f38b7a54fcad24627b7192f78f8110d76/coverage-7.14.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:a06c76364a9360e33d6d23769aefdf7f66f38e2ffb60ceb1baaa4989d83b695c", size = 220022, upload-time = "2026-05-26T20:39:03.702Z" }, + { url = "https://files.pythonhosted.org/packages/72/81/fdc0898a55c6219223291ec1a1fe89966ef212ce82276aa0899df84b5de0/coverage-7.14.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:fad54e871165f6ec2f536063ac74c3104508a12963e64072ba44bd822de52b0c", size = 220379, upload-time = "2026-05-26T20:39:05.381Z" }, + { url = "https://files.pythonhosted.org/packages/de/72/de048c4a25e13bce59ac6a339351c10bdf2515e07459afcdaf04dc3143a2/coverage-7.14.1-cp312-cp312-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:84b535f00655ecafe1d929d1fb00ed5d6fa3051ea643ab2c161a3887b86f294b", size = 251888, upload-time = "2026-05-26T20:39:07.367Z" }, + { url = "https://files.pythonhosted.org/packages/28/30/300c343f68beb9d4cbb64ec81e58c5b6b80b56927f72d2b38654ac26e013/coverage-7.14.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:6b6b0853b895fe0e98cbfc580d1ec3393d9302b4b1e96a77b3f5c91fdab899e6", size = 254624, upload-time = "2026-05-26T20:39:09.037Z" }, + { url = "https://files.pythonhosted.org/packages/b1/ed/7b25642496e8170b6bac14adce00537c6e5fa2d586159401a4de3e8b49e6/coverage-7.14.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:442cc9c952b2df400cda54bb04ab87330cf2cd08a8692cbbea36773531eb6f37", size = 255739, upload-time = "2026-05-26T20:39:10.889Z" }, + { url = "https://files.pythonhosted.org/packages/7f/a2/abd210b8c4e29c24e4624916db97bb519097a91034aaeb767f937e7da794/coverage-7.14.1-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:8270544c361ed405a27a060dbc9ed2c124b084d96dfdc2d9a2510482aef981ad", size = 257998, upload-time = "2026-05-26T20:39:12.722Z" }, + { url = "https://files.pythonhosted.org/packages/7f/24/7c50beed3792fe62f6ce0545c6686ce83379719e2c0276179333d97eae92/coverage-7.14.1-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:48b283b1dd6372e8de2a7a9a4c4d5dc06f4d4fd209b876f3c88a7a205a0c8f84", size = 252296, upload-time = "2026-05-26T20:39:14.259Z" }, + { url = "https://files.pythonhosted.org/packages/15/05/0f874628ebcbfc77ead559ff210281ef06a97db08481832e7dd39274a135/coverage-7.14.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:5b0c99ba93a07d56f6df340bb79be53202a082b2fdb81bfe6190b741a3470d54", size = 253658, upload-time = "2026-05-26T20:39:15.923Z" }, + { url = "https://files.pythonhosted.org/packages/99/6f/ca6ad067364b337ef997802115e7ecad2abd2248b05471464b0dea02b4d4/coverage-7.14.1-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:e471bc5769ff073b058cfadb0d736b56ce067c8560eabeb0da88462df98c23e7", size = 251803, upload-time = "2026-05-26T20:39:17.537Z" }, + { url = "https://files.pythonhosted.org/packages/c0/30/b9b4d377cd9f40baf228068f5a81faf8450c6228503011bd499708483a50/coverage-7.14.1-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:f497a1ea81d4cd7c10ddcaa685135b9aabd291af3d55775a9ddf3cb7a364cdd9", size = 255873, upload-time = "2026-05-26T20:39:19.414Z" }, + { url = "https://files.pythonhosted.org/packages/3c/21/7c721a9e5e6bb88547d30a787aefb97512d3f54c1324c7488d9b3743f7f9/coverage-7.14.1-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:2222be86d0b54f5dd5a38f45f17f315f737245e857bf0bdedc70734f84a13c02", size = 251372, upload-time = "2026-05-26T20:39:21.169Z" }, + { url = "https://files.pythonhosted.org/packages/9d/8c/f8ae5a2200130e1503cd7661a6cd3b2b7bacef98277fbf3571fb13f8b766/coverage-7.14.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:85e85586565842f6932abebd4c18bcb1074223dc0b3576e7d173ca710622813a", size = 253245, upload-time = "2026-05-26T20:39:23.097Z" }, + { url = "https://files.pythonhosted.org/packages/34/62/70a9024672a5f6910517d9628c52c9afbdd3cf8f46426af52bb148a56fff/coverage-7.14.1-cp312-cp312-win32.whl", hash = "sha256:4a28fd227808366b196a75476dced2eb35b351d6766ba9c858dc93319e87f4f1", size = 222567, upload-time = "2026-05-26T20:39:24.868Z" }, + { url = "https://files.pythonhosted.org/packages/f6/81/8b7cd386839b039ebe1855733b9f9449a8dec5d79564018234f185a7fa70/coverage-7.14.1-cp312-cp312-win_amd64.whl", hash = "sha256:54acdb6674a4661768d7bf7db32dfb9f46ab1d764f8aba6df75ce1a6a088724e", size = 223372, upload-time = "2026-05-26T20:39:26.603Z" }, + { url = "https://files.pythonhosted.org/packages/ae/ba/b44d472022f620d289d95fa830143235c0c36461c6f2437ea8d51e5481ed/coverage-7.14.1-cp312-cp312-win_arm64.whl", hash = "sha256:99cd41ff91afd94896fea3bc002706b6ae4ce95727d06e4a0f39c0a8d8bd8b1a", size = 221989, upload-time = "2026-05-26T20:39:28.242Z" }, + { url = "https://files.pythonhosted.org/packages/8a/9e/5f6d56327c62b185225d145191c607e07515294a0aa6338e58805cd4a5ac/coverage-7.14.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:be9f2c802dcfce3f71298303aa5dad0dce440a76c52f2f60dacd8656dab78793", size = 220044, upload-time = "2026-05-26T20:39:29.902Z" }, + { url = "https://files.pythonhosted.org/packages/75/92/e82aca356744cbbc0f77a0b623e38918c1872361963413a3bab5d0340393/coverage-7.14.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:6223a72fd0e4c7156353ec0f08a5f93623e1d3034d0e2683b9bb8ea674131b1d", size = 220412, upload-time = "2026-05-26T20:39:31.561Z" }, + { url = "https://files.pythonhosted.org/packages/27/c9/385bde0bf7ed0f4bf3a7ee5367060a86b5d218718cfd6fb943c0f836b34f/coverage-7.14.1-cp313-cp313-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:7279d2110a28cebc738b6459ecda2771735a4c18465fbbd36b3288fe5ed92247", size = 251412, upload-time = "2026-05-26T20:39:33.337Z" }, + { url = "https://files.pythonhosted.org/packages/51/8c/23faf6a2343a0d17f960a4bd56c43bc7eb4cf312f774dd6ceebd82c7d8fc/coverage-7.14.1-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:9eeb3fcbc13ba40dfbdb22d01d196a28e9cef9ed4c29b60061a1e0e823a9929d", size = 254008, upload-time = "2026-05-26T20:39:35.009Z" }, + { url = "https://files.pythonhosted.org/packages/42/06/36f4aa9ca8a815e6036156e80706a67828bb97bd826948244f6996dda957/coverage-7.14.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5f0cfc27c539f07cf5c0a4cfe211d0b6cae039f8f40526dbaa71944e64b50a7b", size = 255241, upload-time = "2026-05-26T20:39:36.71Z" }, + { url = "https://files.pythonhosted.org/packages/ca/79/95266316352f90f6b1c6736bb413302edfde2453fb32422d3911642691b3/coverage-7.14.1-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:221c70f316241a78e77e607c227cefc8808d4e08f28d99c04f35694690e940be", size = 257373, upload-time = "2026-05-26T20:39:38.412Z" }, + { url = "https://files.pythonhosted.org/packages/e3/9c/58316d1f66c488b5fca8a0eb3e98348807813efa8a0d0833b9021be27488/coverage-7.14.1-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:da028256b04ec30e5e0114b6f76172938c313991f0a2d3d894271315cf5d5e43", size = 251635, upload-time = "2026-05-26T20:39:40.268Z" }, + { url = "https://files.pythonhosted.org/packages/ef/5a/ca2398a568e16fed7bb713e84ba3603a7164fb65779abe645c565ec890d5/coverage-7.14.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:76a085d7005236a767e3426148b2c407e53ad61695c562f8a81da2d373324901", size = 253373, upload-time = "2026-05-26T20:39:42.145Z" }, + { url = "https://files.pythonhosted.org/packages/6e/2c/0396562c32deaebe7be51d865b3a41e9a87d7561acafe1a28f53b07e019a/coverage-7.14.1-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:b553d04b5e778a8e56d57eb134aff42a92718ecba45e79c4764ecfa40efd92ff", size = 251341, upload-time = "2026-05-26T20:39:43.907Z" }, + { url = "https://files.pythonhosted.org/packages/fd/8f/a94f9221184c9cae1ee115820e3798e48b6b17777a9f19e46fb9a0c8dc74/coverage-7.14.1-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:46f714d2fb8ae2f4f29f23ada7f1e79b759fff5a70f94a1dac23af204c3ec9e4", size = 255497, upload-time = "2026-05-26T20:39:46.166Z" }, + { url = "https://files.pythonhosted.org/packages/71/69/505d70e47db1eaebcd002c39759707621ef184cd6b1ae084d9f41293f323/coverage-7.14.1-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:1896f5e19ff3f0431c7ce2172adc54890fd97f86b59ced8ca1649145d9ffe35d", size = 251159, upload-time = "2026-05-26T20:39:48.03Z" }, + { url = "https://files.pythonhosted.org/packages/e0/aa/58681c383aa33a9d2ed40a02d7a22fbf780d1fa4d575396365777828198c/coverage-7.14.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:62fd185ef9df3c33d1c8178c5af105f762afbad96038de9a4ae100aa6297ca33", size = 252934, upload-time = "2026-05-26T20:39:49.872Z" }, + { url = "https://files.pythonhosted.org/packages/eb/fd/11c928cd6bdffc7074bb5965c173d9ebf517fb00205e1da524b98d29ef92/coverage-7.14.1-cp313-cp313-win32.whl", hash = "sha256:ab4af6352741a604c431c6072fce5bee33bf0f20dc7a56618d6bf6bb89e9810c", size = 222584, upload-time = "2026-05-26T20:39:51.68Z" }, + { url = "https://files.pythonhosted.org/packages/6f/92/fb416fc26d340dcba19518c418d6048e913186e17243982c5e435e41fa7a/coverage-7.14.1-cp313-cp313-win_amd64.whl", hash = "sha256:7af486dabe8954d03b087f0021540897afe084f04e16ff5579e08cc46f871416", size = 223394, upload-time = "2026-05-26T20:39:53.472Z" }, + { url = "https://files.pythonhosted.org/packages/73/c6/02d56e3867972f77d5036de924643f26c056e848f00452cafb4dbc3c29b4/coverage-7.14.1-cp313-cp313-win_arm64.whl", hash = "sha256:2224f89ffd0c5605ccce1ed7a584da162bc7c55f601ab1c946bc9de31a486b42", size = 222015, upload-time = "2026-05-26T20:39:55.374Z" }, + { url = "https://files.pythonhosted.org/packages/4d/9e/fcc77914050df73f7662fa1f00902774c79c075a8388ab334074574bf77e/coverage-7.14.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:de286598cc65d2b489411174b1faec2f5a7775fb3201fd925db2a76b4030f37d", size = 220733, upload-time = "2026-05-26T20:39:57.189Z" }, + { url = "https://files.pythonhosted.org/packages/f7/67/2963cbdaf5cbadec44efa3a1e39eaa1f02df4079585f05387607a221e126/coverage-7.14.1-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:042c46ded7c288aeb07cf14a28b6c1e10b78fcba40171c3fa1e939377eeef0b5", size = 221086, upload-time = "2026-05-26T20:39:59.019Z" }, + { url = "https://files.pythonhosted.org/packages/c8/c5/8701645574e11881f2f47d8930f98bc48b5d43b25eb5b4430dfc4a2f9f48/coverage-7.14.1-cp313-cp313t-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:f4ddbe407477f04c45115d1a4e5bc480f753553b534d338d4c3358b1cdd0ea52", size = 262381, upload-time = "2026-05-26T20:40:00.822Z" }, + { url = "https://files.pythonhosted.org/packages/7c/28/7a64d73598263e0c5abd5084211a8474488d31b3c552ff531c719dfcff62/coverage-7.14.1-cp313-cp313t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:d13e6725992e2d2fd7d81d4f5241952d13740121dfd501da09201be39b2c003a", size = 264458, upload-time = "2026-05-26T20:40:02.506Z" }, + { url = "https://files.pythonhosted.org/packages/fa/d8/4969179db9f7eb4df218e69540adf829d1c835f59452513d065d15446802/coverage-7.14.1-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f747dc8edcfe740130f28f32f3995e955494285717e86ee25af51db2219df08a", size = 266884, upload-time = "2026-05-26T20:40:04.421Z" }, + { url = "https://files.pythonhosted.org/packages/a6/78/a45d5794dbc9bafd97afc96a4377c86c7820d78b6cf51b89bc1d4e919275/coverage-7.14.1-cp313-cp313t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:ced2f09ef276fd58611a1ef502164ad266d2b75174e5a40cabbdb4033f9f6cf2", size = 268022, upload-time = "2026-05-26T20:40:06.298Z" }, + { url = "https://files.pythonhosted.org/packages/21/cb/4f5e354e9e3e67af96bd4e57113e6db6b22298c7168b13eec408a549903d/coverage-7.14.1-cp313-cp313t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:b84800013769a78ccb9ef4659402e26d06867e337b61ec365f77ad008adea80e", size = 261631, upload-time = "2026-05-26T20:40:08.226Z" }, + { url = "https://files.pythonhosted.org/packages/ec/49/eced49af4cb996d5d8b7e94e736175c513e4facd3398507b89892b4326d8/coverage-7.14.1-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:ea8cd6ca0ee9f616aaef3afc6882e32c2cbf18b00d96313ffd76af650574034d", size = 264443, upload-time = "2026-05-26T20:40:10.137Z" }, + { url = "https://files.pythonhosted.org/packages/f1/d8/5603a88a7c5913a6b54f6cb1a8c46f7b39cbb30f27cd3f492908da09b2d7/coverage-7.14.1-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:aa5e304a873fabddc11e484e9b6b738bd38bd7bed17b09aa84eecf5332e8b8bb", size = 262069, upload-time = "2026-05-26T20:40:11.999Z" }, + { url = "https://files.pythonhosted.org/packages/f0/59/2ae3cb79da554a06c8619d6c88ea19dd1e4aed4b834b6a83bb1fa243bdc5/coverage-7.14.1-cp313-cp313t-musllinux_1_2_ppc64le.whl", hash = "sha256:5a1c5215be81035e629d5bc756650634d0bf31991038db7a0eccb90f025ce16d", size = 265780, upload-time = "2026-05-26T20:40:13.858Z" }, + { url = "https://files.pythonhosted.org/packages/af/5f/b130c1dc999031f2648bd25317fbce505ad8d5562079b4ed81e736a84967/coverage-7.14.1-cp313-cp313t-musllinux_1_2_riscv64.whl", hash = "sha256:79058c47dae6788504b5effb319961bcd72d7240551464b91d474bc0ed186d69", size = 260970, upload-time = "2026-05-26T20:40:16.142Z" }, + { url = "https://files.pythonhosted.org/packages/87/d1/ec13ccddeb48ec963bdfa72a11224bac2584bd045ba13beca82f8113e9c7/coverage-7.14.1-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:370c5afae3fa0658e11694a32b24c2778f6bc2d17718121f94ee185e69f26b54", size = 263157, upload-time = "2026-05-26T20:40:18.382Z" }, + { url = "https://files.pythonhosted.org/packages/cf/c2/cd91ead503045161092d3845f7bb95ea2f25131ce96d3e314dd835d91b9c/coverage-7.14.1-cp313-cp313t-win32.whl", hash = "sha256:3758dd0a7f1fa57365ef2e781df0f0731d38b6e3772259d13dae4bd8a958d4b1", size = 223259, upload-time = "2026-05-26T20:40:20.381Z" }, + { url = "https://files.pythonhosted.org/packages/71/9f/1e28d97e6bd2c76b07f38b7c02870f1371255ff6717f54eca578fcbbdd0e/coverage-7.14.1-cp313-cp313t-win_amd64.whl", hash = "sha256:6ff665fb023a77386fe11685190cee1f60a7d635994a30d9b0a061533d470fce", size = 224320, upload-time = "2026-05-26T20:40:22.316Z" }, + { url = "https://files.pythonhosted.org/packages/a9/e0/d936e908f0e1efa55e52b91e01b52f1055cef5e1ab2718493390ed8e2fb8/coverage-7.14.1-cp313-cp313t-win_arm64.whl", hash = "sha256:17a5a241e5997621a956a7f402a7433ef4221e5152809b785bec79e2323799f1", size = 222577, upload-time = "2026-05-26T20:40:24.894Z" }, + { url = "https://files.pythonhosted.org/packages/d6/34/fc2f101b151af3799a101f0550b0454aa008afdc0add677394ec4aa8ea10/coverage-7.14.1-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:d5ed429d0b8edaac649e889b4ffcedb6c80b06629a3f93050e3dddfb99235bee", size = 220091, upload-time = "2026-05-26T20:40:27.249Z" }, + { url = "https://files.pythonhosted.org/packages/3d/a7/1ebae2ab5b961b5c79bb09fe7b3ac99edb190d8be4a8c510b2cf66f46468/coverage-7.14.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:8011224a62280e50dab346960c03cf47aca1a1e09e608c0fb33fd6e0cc8e9500", size = 220421, upload-time = "2026-05-26T20:40:30.084Z" }, + { url = "https://files.pythonhosted.org/packages/5e/90/92aca9cf0acc95123c96cd1eb1f08917897a7f5dee01e15738922971ec31/coverage-7.14.1-cp314-cp314-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:12c42ec1e14f553c4f817e989365982e646e27211f10a0f717855b94a79c8906", size = 251466, upload-time = "2026-05-26T20:40:32.542Z" }, + { url = "https://files.pythonhosted.org/packages/26/2b/78048cbe3b999f6cbf9cc0d90abba6a88a3e0863a8c1c6cbc762f3f8802f/coverage-7.14.1-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:06144cd511cf2624873a035c5069cf297144f6e77a73ee3d7a55b605ec5efb42", size = 253973, upload-time = "2026-05-26T20:40:34.473Z" }, + { url = "https://files.pythonhosted.org/packages/8e/21/c2e33b29d1cfde484a19d437afc343c6cd30b08d78cbbf9f5aff14e57b2b/coverage-7.14.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a311d8e1da24be5c1ccf85cbfb06315dbaa1703d5a1eab3f6432c72b837917c8", size = 255318, upload-time = "2026-05-26T20:40:38.154Z" }, + { url = "https://files.pythonhosted.org/packages/8e/ee/aad2f108d63b769121005302f16bf66db8625c88ceaba466942e09a2607e/coverage-7.14.1-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:c79cead5b5bc584d9c71451cb984d0e3a84e0c0937379c8efcbf27c8d661b851", size = 257633, upload-time = "2026-05-26T20:40:40.164Z" }, + { url = "https://files.pythonhosted.org/packages/c2/f8/11a2c29b4fd76d9849f81d0bb812ec0017a9396df3217214e38934a8c837/coverage-7.14.1-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:dcbf65f1f66a26cdd88c35cf68fb4729c5d1cd2e88added72420541dfb212034", size = 251488, upload-time = "2026-05-26T20:40:42.631Z" }, + { url = "https://files.pythonhosted.org/packages/c9/b8/9a5820de4b8ac2b71d85e3b5fb49108d7469c665f0e2ad0dd7569023e305/coverage-7.14.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:fd86572566fb40189a8260446158235159bc7a82dfbc87a3b39cf4fb57fcec1c", size = 253329, upload-time = "2026-05-26T20:40:45.208Z" }, + { url = "https://files.pythonhosted.org/packages/6b/ff/f33e4823667e27548e8fd8df44217515303f9808d0ff29817db56f87d990/coverage-7.14.1-cp314-cp314-musllinux_1_2_i686.whl", hash = "sha256:7771b601718fdde84832c3a434ca9bbf4ae9adbc49d84198b4110700c3c77c36", size = 251291, upload-time = "2026-05-26T20:40:47.502Z" }, + { url = "https://files.pythonhosted.org/packages/68/9b/489db0ebb209054766b90a9014a45f6d26eb724c02ec21311c3733b5a644/coverage-7.14.1-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:39b21e212c55af06fa375e3dbf90a8a8e38792f3a910c580066d23563830ddd5", size = 255564, upload-time = "2026-05-26T20:40:49.372Z" }, + { url = "https://files.pythonhosted.org/packages/27/b5/16bc2d4c2409b23c7737edb68c83bc89e345f378050549fe1d75ac7d34d5/coverage-7.14.1-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:f2302660e32562a532b442480121aef8aa61a5bdb20b30bf0adab29f10a5a4b4", size = 251107, upload-time = "2026-05-26T20:40:51.677Z" }, + { url = "https://files.pythonhosted.org/packages/7d/0c/2629997469a00cd069d588a41c9dc887610f2775ae89d250c4791e65272a/coverage-7.14.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:03a6f93c1ec3b7f2e77b5dbcc5573a2c21f12529a5c6bbe0f16f72303cc2fa4d", size = 252764, upload-time = "2026-05-26T20:40:54.267Z" }, + { url = "https://files.pythonhosted.org/packages/d2/ee/f78d63c8f079e0d7211c7e2401fa17e311514534ba61bae03e4b287ce4ab/coverage-7.14.1-cp314-cp314-win32.whl", hash = "sha256:8a3ce026d73290f42f08dafecbd82c193a74df280461fbf97300fec51fd133ee", size = 222837, upload-time = "2026-05-26T20:40:56.496Z" }, + { url = "https://files.pythonhosted.org/packages/dc/b9/be539854f93a70dfbeec69117f33ec70dc42ff0b65b5b07ab8d40d04228e/coverage-7.14.1-cp314-cp314-win_amd64.whl", hash = "sha256:114c95ef29302423b87d159075805f4ab973254a2638a5d7d046c94887cc87d7", size = 223650, upload-time = "2026-05-26T20:40:58.351Z" }, + { url = "https://files.pythonhosted.org/packages/fe/9e/24e2842fef40f35ac82ba3a7719c8023d011bf3bf652d0675316a9d088a1/coverage-7.14.1-cp314-cp314-win_arm64.whl", hash = "sha256:a07891c3f4805442b31b71e84ba3cf29ed1aa9a428284e06deeb4b23e5b46343", size = 222218, upload-time = "2026-05-26T20:41:00.321Z" }, + { url = "https://files.pythonhosted.org/packages/0a/1d/ac0a9df5fe31c1e8bdd658074905fc12844a05c1a7e3fdb8417e97c31e23/coverage-7.14.1-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:1101a5ebb083aecb625ebb6209d4105b58f647b093cb2dc8122d7b33f743cfe1", size = 220822, upload-time = "2026-05-26T20:41:02.281Z" }, + { url = "https://files.pythonhosted.org/packages/32/cf/f964fd9aff20323f9f1a726c97135f8a76bcd87b92dad141a456a43f3c64/coverage-7.14.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:851b9e1e4e8a4608e77c79714b2e77c0970d2ed7202a05e92ae407817481887b", size = 221084, upload-time = "2026-05-26T20:41:04.593Z" }, + { url = "https://files.pythonhosted.org/packages/d8/5e/7e5ef2aba844de2b80d678619fcf0841b42e3f37f16411226f3fe4c1016f/coverage-7.14.1-cp314-cp314t-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:d5b89cdfb2ee051b71e8c3c70bd81a9eff81100f736a269136fe1a68efe00474", size = 262454, upload-time = "2026-05-26T20:41:06.641Z" }, + { url = "https://files.pythonhosted.org/packages/64/62/75809bded87015cc4935524218a2a8ed8dd1a8498bfed30a2f4f7a4b4d34/coverage-7.14.1-cp314-cp314t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:0177614a0370f227888b4e436a7c55686d6a9f90eb1ade2b624ba685a1686e86", size = 264578, upload-time = "2026-05-26T20:41:08.556Z" }, + { url = "https://files.pythonhosted.org/packages/f3/42/d33392dc14633525012d2d504fa1a33b05538bf535f5c1d64675e5754b78/coverage-7.14.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:2d69af5dea2de76fc485a83032a630523f985198b7e25be901ec60181587b01e", size = 266981, upload-time = "2026-05-26T20:41:10.824Z" }, + { url = "https://files.pythonhosted.org/packages/2a/49/0157c4428c2aca7f1e09d5565930586fd5ae36f1655f08b0daa7cf1fcae1/coverage-7.14.1-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:35ab22d91de736e8966b980dc355cbcdd2c6dbbcfe275f9a2991bc8a91b3df65", size = 268112, upload-time = "2026-05-26T20:41:12.966Z" }, + { url = "https://files.pythonhosted.org/packages/96/26/86b9ce71f4092b1ed325ce1421698081df1286b833400b6836912834d6e0/coverage-7.14.1-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:357d4e32935c36588aaba057d734fa32428c360c9fc2e4442afbf1b646beee6e", size = 261558, upload-time = "2026-05-26T20:41:15Z" }, + { url = "https://files.pythonhosted.org/packages/20/4c/c311210c5472cf5401d8422b0d7812cdd520f24417673afabda6c323faca/coverage-7.14.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:51bd64741cc6fa065abd300ede1afe5a5291ece9c31da8b24884deda48bcc3f8", size = 264447, upload-time = "2026-05-26T20:41:17.369Z" }, + { url = "https://files.pythonhosted.org/packages/fb/71/59513f8710ed3e6b0ac0a050a5b7e977bb9c9e880354863b5d00d8809256/coverage-7.14.1-cp314-cp314t-musllinux_1_2_i686.whl", hash = "sha256:9132cd363a68a4c3daa7c8704a654b1e39d3360f6f5b8ddd470608a945236c07", size = 262048, upload-time = "2026-05-26T20:41:19.309Z" }, + { url = "https://files.pythonhosted.org/packages/84/8d/bceed32dc494f5bbf50f775cd2e78ca814953942b5ea28d3c1c3ac316f14/coverage-7.14.1-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:07c6290b1697b862c0478eab545eec949a0d0e4d6d03497f446d706da3b4f2de", size = 265781, upload-time = "2026-05-26T20:41:21.559Z" }, + { url = "https://files.pythonhosted.org/packages/e7/c5/9348fe40dbfd4991aaf78df2c6c3098bfb2cc834d1fd362a64b4efef855a/coverage-7.14.1-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:5ea0c297e27133853b4d8a3eb799bff5a2dbd9f2f41537a240d337ac9b4df890", size = 260896, upload-time = "2026-05-26T20:41:23.428Z" }, + { url = "https://files.pythonhosted.org/packages/ca/92/1ea0f03929da7cf87206b1fa24f4c8e9c158be0455481af29ec0a1f3503f/coverage-7.14.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:01b7733daad0237daa01ef80fe2dfceffc911e6a17fa7b55d14aa8214eaaaecd", size = 263214, upload-time = "2026-05-26T20:41:25.419Z" }, + { url = "https://files.pythonhosted.org/packages/f6/a9/b2493c054c0e01a643266742ab45e15744e60743f9260cd930c7142b1124/coverage-7.14.1-cp314-cp314t-win32.whl", hash = "sha256:6adc5a36984624a70bf11d7184e20fa0a49aa7c47ffab43804106a1a695ea22e", size = 223624, upload-time = "2026-05-26T20:41:27.795Z" }, + { url = "https://files.pythonhosted.org/packages/fc/bd/3e1e6a57fccd2d7c83fcdf338e93ba98eb85c6e877dd34731ac585375490/coverage-7.14.1-cp314-cp314t-win_amd64.whl", hash = "sha256:ddf799247318f34dbcd2efa8c95a8d0642674e926bb1774cf9b63dfd2a389d1c", size = 224728, upload-time = "2026-05-26T20:41:30.098Z" }, + { url = "https://files.pythonhosted.org/packages/bb/d7/31066cf1d2f0c6c797fce911bcfa01dd35642dc6da992a950256097c5860/coverage-7.14.1-cp314-cp314t-win_arm64.whl", hash = "sha256:145986fe66647eb489f18d9a997567a3fd358584c4b5a808769113abc07466af", size = 222752, upload-time = "2026-05-26T20:41:32.123Z" }, + { url = "https://files.pythonhosted.org/packages/8a/3c/1a983b9a745d7f83d53f057bcc5bf79ba6a2bbc08266b3f0c7d6fe630c9b/coverage-7.14.1-py3-none-any.whl", hash = "sha256:a252f21c27e38347e60111a3266b03827422a7d5525951aceee313aa68bab1d2", size = 211815, upload-time = "2026-05-26T20:41:34.078Z" }, +] + +[package.optional-dependencies] +toml = [ + { name = "tomli", marker = "python_full_version <= '3.11'" }, +] + +[[package]] +name = "distlib" +version = "0.4.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/86/b2/d6fc3f2347f43dada79e5ff118493e8109c98400a0e29a1d5264a3aa479b/distlib-0.4.1.tar.gz", hash = "sha256:c3804d0d2d4b5fcd44036eb860cb6660485fcdf5c2aba53dc324d805837ea65b", size = 610526, upload-time = "2026-06-02T11:17:40.691Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/25/18/3497c4fa83a76dcb154923fd2075522e8dd6995ecee4093c00ae18160046/distlib-0.4.1-py2.py3-none-any.whl", hash = "sha256:9c2c552c68cbadc619f2d0ed3a69e27c351a3f4c9baa9ffb7df9e9cdc3d19a97", size = 469216, upload-time = "2026-06-02T11:17:38.779Z" }, +] + +[[package]] +name = "filelock" +version = "3.29.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/1f/f9/f38573ed5844586db374d085911740a501ccfa373b455fc9413f09f85237/filelock-3.29.1.tar.gz", hash = "sha256:d97e6b1b9757569626c58caa07dc4beb1613f4a2938b1e8cc81afca398906c9e", size = 59335, upload-time = "2026-06-03T15:19:04.053Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/4c/a0/614c5fe402fd88951df45f4dda2fa3b4e17a99ecd92340771929169b3b95/filelock-3.29.1-py3-none-any.whl", hash = "sha256:85199dfd706869641b72b2e8955d5416a4b2b7dc4b0e8e6d97b4cc1299a6983b", size = 40750, upload-time = "2026-06-03T15:19:02.959Z" }, +] + +[[package]] +name = "identify" +version = "2.6.19" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/52/63/51723b5f116cc04b061cb6f5a561790abf249d25931d515cd375e063e0f4/identify-2.6.19.tar.gz", hash = "sha256:6be5020c38fcb07da56c53733538a3081ea5aa70d36a156f83044bfbf9173842", size = 99567, upload-time = "2026-04-17T18:39:50.265Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/94/84/d9273cd09688070a6523c4aee4663a8538721b2b755c4962aafae0011e72/identify-2.6.19-py2.py3-none-any.whl", hash = "sha256:20e6a87f786f768c092a721ad107fc9df0eb89347be9396cadf3f4abbd1fb78a", size = 99397, upload-time = "2026-04-17T18:39:49.221Z" }, +] + +[[package]] +name = "iniconfig" +version = "2.3.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/72/34/14ca021ce8e5dfedc35312d08ba8bf51fdd999c576889fc2c24cb97f4f10/iniconfig-2.3.0.tar.gz", hash = "sha256:c76315c77db068650d49c5b56314774a7804df16fee4402c1f19d6d15d8c4730", size = 20503, upload-time = "2025-10-18T21:55:43.219Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12", size = 7484, upload-time = "2025-10-18T21:55:41.639Z" }, +] + +[[package]] +name = "librt" +version = "0.11.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/40/08/9e7f6b5d2b5bed6ad055cdd5925f192bb403a51280f86b56554d9d0699a2/librt-0.11.0.tar.gz", hash = "sha256:075dc3ef4458a278e0195cbf6ac9d38808d9b906c5a6c7f7f79c3888276a3fb1", size = 200139, upload-time = "2026-05-10T18:17:25.138Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/fe/87/2bf31fe17587b29e3f93ec31421e2b1e1c3e349b8bf6c7c313dbad1d5340/librt-0.11.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:93d95bd45b7d58343d8b90d904450a545144eec19a002511163426f8ab1fae29", size = 141092, upload-time = "2026-05-10T18:15:34.795Z" }, + { url = "https://files.pythonhosted.org/packages/cf/08/5c5bf772920b7ebac6e32bc91a643e0ab3870199c0b542356d3baa83970a/librt-0.11.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:4ee278c769a713638cdacd4c0436d72156e75df3ebc0166ab2b9dc43acc386c9", size = 142035, upload-time = "2026-05-10T18:15:36.242Z" }, + { url = "https://files.pythonhosted.org/packages/06/20/662a03d254e5b000d838e8b345d83303ddb768c080fd488e40634c0fa66b/librt-0.11.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f230cb1cbc9faaa616f9a678f530ebcf186e414b6bcbd88b960e4ba1b92428d5", size = 475022, upload-time = "2026-05-10T18:15:37.56Z" }, + { url = "https://files.pythonhosted.org/packages/de/f3/aa81523e45184c6ec23dc7f63263362ec55f80a09d424c012359ecbe7e35/librt-0.11.0-cp311-cp311-manylinux2014_i686.manylinux_2_17_i686.manylinux_2_28_i686.whl", hash = "sha256:5d63c855d86938d9de93e265c9bd8c705b51ec494de5738340ee93767a686e4b", size = 467273, upload-time = "2026-05-10T18:15:39.182Z" }, + { url = "https://files.pythonhosted.org/packages/6b/6f/59c74b560ca8853834d5501d589c8a2519f4184f273a085ffd0f37a1cc47/librt-0.11.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:993f028be9e96a08d31df3479ac80d99be374d17f3b78e4796b3fd3c913d4e89", size = 497083, upload-time = "2026-05-10T18:15:40.634Z" }, + { url = "https://files.pythonhosted.org/packages/fe/7b/5aa4d2c9600a719401160bf7055417df0b2a47439b9d88286ce45e56b65f/librt-0.11.0-cp311-cp311-manylinux_2_34_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:258d73a0aa66a055e65b2e4d1b8cdb23b9d132c5bb915d9547d804fcaed116cc", size = 489139, upload-time = "2026-05-10T18:15:41.934Z" }, + { url = "https://files.pythonhosted.org/packages/d6/31/9143803d7da6856a69153785768c4936864430eec0fd9461c3ea527d9922/librt-0.11.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:0827efe7854718f04aaddf6496e96960a956e676fe1d0f04eb41511fd8ad06d5", size = 508442, upload-time = "2026-05-10T18:15:43.206Z" }, + { url = "https://files.pythonhosted.org/packages/2f/5a/bce08184488426bda4ccc2c4964ac048c8f68ae89bd7120082eef4233cfd/librt-0.11.0-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:7753e57d6e12d019c0d8786f1c09c709f4c3fcc57c3887b24e36e6c06ec938b7", size = 514230, upload-time = "2026-05-10T18:15:44.761Z" }, + { url = "https://files.pythonhosted.org/packages/89/8c/bb5e213d254b7505a0e658da199d8ab719086632ce09eef311ab27976523/librt-0.11.0-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:11bd19822431cc21af9f27374e7ae2e58103c7d98bda823536a6c47f6bb2bb3d", size = 494231, upload-time = "2026-05-10T18:15:46.308Z" }, + { url = "https://files.pythonhosted.org/packages/9d/fb/541cdad5b1ab1300398c74c4c9a497b88e5074c21b1244c8f49731d3a284/librt-0.11.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:22bdf239b219d3993761a148ffa134b19e52e9989c84f845d5d7b71d70a17412", size = 537585, upload-time = "2026-05-10T18:15:47.629Z" }, + { url = "https://files.pythonhosted.org/packages/8f/f2/464bb69295c320cb06bddb4f14a4ec67934ee14b2bffb12b19fb7ab287ba/librt-0.11.0-cp311-cp311-win32.whl", hash = "sha256:46c60b61e308eb535fbd6fa622b1ee1bb2815691c1ad9c98bf7b84952ec3bc8d", size = 100509, upload-time = "2026-05-10T18:15:49.157Z" }, + { url = "https://files.pythonhosted.org/packages/6d/e7/a17ee1788f9e4fbf548c19f4afa07c92089b9e24fef6cb2410863781ef4c/librt-0.11.0-cp311-cp311-win_amd64.whl", hash = "sha256:902e546ff044f579ff1c953ff5fce97b636fe9e3943996b2177710c6ef076f73", size = 118628, upload-time = "2026-05-10T18:15:50.345Z" }, + { url = "https://files.pythonhosted.org/packages/cc/c7/6c766214f9f9903bcfcfbef97d807af8d8f5aa3502d247858ab17582d212/librt-0.11.0-cp311-cp311-win_arm64.whl", hash = "sha256:65ac3bc20f78aa0ee5ae84baa68917f89fef4af63e941084dd019a0d0e749f0c", size = 103122, upload-time = "2026-05-10T18:15:52.068Z" }, + { url = "https://files.pythonhosted.org/packages/8b/d0/07c77e067f0838949b43bd89232c29d72efebb9d2801a9750184eb706b71/librt-0.11.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b87504f1690a23b9a2cca841191a04f83895d4fc2dd04df91d82b1a04ca2ad46", size = 144147, upload-time = "2026-05-10T18:15:53.227Z" }, + { url = "https://files.pythonhosted.org/packages/7a/24/8493538fa4f62f982686398a5b8f68008138a75086abdea19ade64bf4255/librt-0.11.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:40071fc5fe0ce8daa6de616702314a01e1250711682b0523d6ab8d4525910cb3", size = 143614, upload-time = "2026-05-10T18:15:54.657Z" }, + { url = "https://files.pythonhosted.org/packages/ff/1e/f8bad050810d9171f34a1648ed910e56814c2ba61639f2bd53c6377ae24b/librt-0.11.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:137e79445c896a0ea7b265f52d23954e05b64222ee1af69e2cb34219067cbb67", size = 485538, upload-time = "2026-05-10T18:15:56.117Z" }, + { url = "https://files.pythonhosted.org/packages/c0/fe/3594ebfbaf03084ba4b120c9ba5c3183fd938a48725e9bbe6ff0a5159ad8/librt-0.11.0-cp312-cp312-manylinux2014_i686.manylinux_2_17_i686.manylinux_2_28_i686.whl", hash = "sha256:cca6644054e78746d8d4ef238681f9c34ff8b584fe6b988ecebb8db3b15e622a", size = 479623, upload-time = "2026-05-10T18:15:57.544Z" }, + { url = "https://files.pythonhosted.org/packages/b0/da/5d1876984b3746c85dbd219dbfcb73c85f54ee263fd32e5b2a632ec14571/librt-0.11.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d5b0eea49f5562861ee8d757a32ef7d559c1d35be2aaaa1ec28941d74c9ffc8a", size = 513082, upload-time = "2026-05-10T18:15:58.805Z" }, + { url = "https://files.pythonhosted.org/packages/19/6e/55bdf5d5ca00c3e18430690bf2c953d8d3ffd3c337418173d33dec985dc9/librt-0.11.0-cp312-cp312-manylinux_2_34_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:0d1029d7e1ae1a7e647ed6fb5df8c4ce2dffefb7a9f5fd1376a4554d96dac09f", size = 508105, upload-time = "2026-05-10T18:16:00.2Z" }, + { url = "https://files.pythonhosted.org/packages/07/10/f1f23a7c595ee90ece4d35c851e5d104b1311a887ed1b4ac4c35bbd13da8/librt-0.11.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:bc3ce6b33c5828d9e80592011a5c584cb2ce86edbc4088405f70da47dc1d1b3b", size = 522268, upload-time = "2026-05-10T18:16:01.708Z" }, + { url = "https://files.pythonhosted.org/packages/b6/02/5720f5697a7f54b78b3aefbe20df3a48cedcff1276618c4aa481177942ed/librt-0.11.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:936c5995f3514a42111f20099397d8177c79b4d7e70961e396c6f5a0a3566766", size = 527348, upload-time = "2026-05-10T18:16:03.496Z" }, + { url = "https://files.pythonhosted.org/packages/50/db/b4a47c6f91db4ff76348a0b3dd0cc65e090a078b765a810a62ff9434c3d3/librt-0.11.0-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:9bc0ca6ad9381cbe8e4aa6e5726e4c80c78115a6e9723c599ed1d73e092bc49d", size = 516294, upload-time = "2026-05-10T18:16:05.173Z" }, + { url = "https://files.pythonhosted.org/packages/9e/58/9384b2f4eb1ed1d273d40948a7c5c4b2360213b402ef3be4641c06299f9c/librt-0.11.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:070aa8c26c0a74774317a72df8851facc7f0f012a5b406557ac56992d92e1ec8", size = 553608, upload-time = "2026-05-10T18:16:06.839Z" }, + { url = "https://files.pythonhosted.org/packages/21/7b/5aa8848a7c6a9278c79375146da1812e695754ceec5f005e6043461a7315/librt-0.11.0-cp312-cp312-win32.whl", hash = "sha256:6bf14feb84b05ae945277395451998c89c54d0def4070eb5c08de544930b245a", size = 101879, upload-time = "2026-05-10T18:16:08.103Z" }, + { url = "https://files.pythonhosted.org/packages/37/33/8a745436944947575b584231750a41417de1a38cf6a2e9251d1065651c09/librt-0.11.0-cp312-cp312-win_amd64.whl", hash = "sha256:75672f0bc524ede266287d532d7923dbce94c7514ad07627bac3d0c6d92cc4d9", size = 119831, upload-time = "2026-05-10T18:16:09.174Z" }, + { url = "https://files.pythonhosted.org/packages/59/67/a6739ac96e28b7855808bdb0370e250606104a859750d209e5a0716fe7ab/librt-0.11.0-cp312-cp312-win_arm64.whl", hash = "sha256:2f10cf143e4a9bb0f4f5af568a00df94a2d69ef41c2579584454bb0fe5cc642c", size = 103470, upload-time = "2026-05-10T18:16:10.369Z" }, + { url = "https://files.pythonhosted.org/packages/82/61/e59168d4d0bf2bf90f4f0caf7a001bfc60254c3af4586013b04dc3ef517b/librt-0.11.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:78dc31f7fdfe9c9d0eb0e8f42d139db230e826415bbcabd9f0e9faaaee909894", size = 144119, upload-time = "2026-05-10T18:16:11.771Z" }, + { url = "https://files.pythonhosted.org/packages/61/fd/caa1d60b12f7dd79ccea23054e06eeaebe266a5f52c40a6b651069200ce5/librt-0.11.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:fa475675db22290c3158e1d42326d0f5a65f04f44a0e68c3630a25b53560fb9c", size = 143565, upload-time = "2026-05-10T18:16:13.334Z" }, + { url = "https://files.pythonhosted.org/packages/b8/a9/dc744f5c2b4978d48db970be29f22716d3413d28b14ad99740817315cf2c/librt-0.11.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:621db29691044bdeda22e789e482e1b0f3a985d90e3426c9c6d17606416205ea", size = 485395, upload-time = "2026-05-10T18:16:14.729Z" }, + { url = "https://files.pythonhosted.org/packages/8f/21/7f8e97a1e4dae952a5a95948f6f8507a173bc1e669f54340bba6ca1ca31b/librt-0.11.0-cp313-cp313-manylinux2014_i686.manylinux_2_17_i686.manylinux_2_28_i686.whl", hash = "sha256:a9010e2ed5b3a9e158c5fd966b3ab7e834bb3d3aacc8f66c91dd4b57a3799230", size = 479383, upload-time = "2026-05-10T18:16:16.321Z" }, + { url = "https://files.pythonhosted.org/packages/a6/6d/d8ee9c114bebf2c50e29ec2aa940826fccb62a645c3e4c18760987d0e16d/librt-0.11.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7c39513d8b7477a2e1ed8c43fc21c524e8d5a0f8d4e8b7b074dbdbe7820a08e2", size = 513010, upload-time = "2026-05-10T18:16:17.647Z" }, + { url = "https://files.pythonhosted.org/packages/f0/43/0b5708af2bd30a46400e72ba6bdaa8f066f15fb9a688527e34220e8d6c06/librt-0.11.0-cp313-cp313-manylinux_2_34_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:7aef3cf1d5af86e770ab04bfd993dfc4ae8b8c17f66fb77dd4a7d50de7bbb1a3", size = 508433, upload-time = "2026-05-10T18:16:19.309Z" }, + { url = "https://files.pythonhosted.org/packages/4a/50/356187247d09013490481033183b3532b58acf8028bcb34b2b56a375c9b2/librt-0.11.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:557183ddc36babe46b27dd60facbd5adb4492181a5be887587d57cda6e092f21", size = 522595, upload-time = "2026-05-10T18:16:20.642Z" }, + { url = "https://files.pythonhosted.org/packages/40/e7/c6ac4240899c7f3248079d5a9900debe0dadb3fdeaf856684c987105ba47/librt-0.11.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:83d3e1f72bd42f6c5c0b7daec530c3f829bd02db42c70b8ddf0c2d90a2459930", size = 527255, upload-time = "2026-05-10T18:16:22.352Z" }, + { url = "https://files.pythonhosted.org/packages/eb/b5/a81322dbeedeeaf9c1ee6f001734d28a09d8383ac9e6779bc24bbd0743c6/librt-0.11.0-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:4ce1f21fbe589bc1afd7872dece84fb0e1144f794a288e58a10d2c54a55c43be", size = 516847, upload-time = "2026-05-10T18:16:23.627Z" }, + { url = "https://files.pythonhosted.org/packages/ae/66/6e6323787d592b55204a42595ff1102da5115601b53a7e9ddebc889a6da5/librt-0.11.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:970b09f7044ea2b64c9da42fd3d335666518cfd1c6e8a182c95da73d0214b41e", size = 553920, upload-time = "2026-05-10T18:16:25.025Z" }, + { url = "https://files.pythonhosted.org/packages/9c/21/623f8ca230857102066d9ca8c6c1734995908c4d0d1bee7bb2ef0021cb33/librt-0.11.0-cp313-cp313-win32.whl", hash = "sha256:78fddc31cd4d3caa897ad5d31f856b1faadc9474021ad6cb182b9018793e254e", size = 101898, upload-time = "2026-05-10T18:16:26.649Z" }, + { url = "https://files.pythonhosted.org/packages/b3/1d/b4ebd44dd723f768469007515cb92251e0ae286c94c140f374801140fa74/librt-0.11.0-cp313-cp313-win_amd64.whl", hash = "sha256:8ca8aa88751a775870b764e93bad5135385f563cb8dcee399abf034ea4d3cb47", size = 119812, upload-time = "2026-05-10T18:16:27.859Z" }, + { url = "https://files.pythonhosted.org/packages/3b/e4/b2f4ca7965ca373b491cdb4bc25cdb30c1649ca81a8782056a83850292a9/librt-0.11.0-cp313-cp313-win_arm64.whl", hash = "sha256:96f044bb325fd9cf1a723015638c219e9143f0dfbc0ca54c565df2b7fc748b44", size = 103448, upload-time = "2026-05-10T18:16:29.066Z" }, + { url = "https://files.pythonhosted.org/packages/29/eb/dbce197da4e227779e56b5735f2decc3eb36e55a1cdbf1bd65d6639d76c1/librt-0.11.0-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:4a017a95e5837dc15a8c5661d60e05daa96b90908b1aa6b7acdf443cd25c8ebd", size = 143345, upload-time = "2026-05-10T18:16:30.674Z" }, + { url = "https://files.pythonhosted.org/packages/76/a3/254bebd0c11c8ba684018efb8006ff22e466abce445215cca6c778e7d9de/librt-0.11.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:b1ecbd9819deccc39b7542bf4d2a740d8a620694d39989e58661d3763458f8d4", size = 143131, upload-time = "2026-05-10T18:16:32.037Z" }, + { url = "https://files.pythonhosted.org/packages/f1/3f/f77d6122d21ac7bf6ae8a7dfced1bd2a7ac545d3273ebdcaf8042f6d619f/librt-0.11.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7da327dacd7be8f8ec36547373550744a3cc0e536d54665cd83f8bcd961200e8", size = 477024, upload-time = "2026-05-10T18:16:33.493Z" }, + { url = "https://files.pythonhosted.org/packages/ac/0a/2c996dadebaa7d9bbbd43ef2d4f3e66b6da545f838a41694ef6172cebec8/librt-0.11.0-cp314-cp314-manylinux2014_i686.manylinux_2_17_i686.manylinux_2_28_i686.whl", hash = "sha256:0dc56b1f8d06e60db362cc3fdae206681817f86ce4725d34511473487f12a34b", size = 474221, upload-time = "2026-05-10T18:16:34.864Z" }, + { url = "https://files.pythonhosted.org/packages/0a/7e/f5d92af8486b8272c23b3e686b46ff72d89c8169585eb61eef01a2ac7147/librt-0.11.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:05fb8fb2ab90e21c8d12ea240d744ad514da9baf381ebfa70d91d20d21713175", size = 505174, upload-time = "2026-05-10T18:16:36.705Z" }, + { url = "https://files.pythonhosted.org/packages/af/1a/cb0734fe86398eb33193ab753b7326255c74cac5eb09e76b9b16536e7adb/librt-0.11.0-cp314-cp314-manylinux_2_34_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:cae74872be221df4374d10fec61f93ed1513b9546ea84f2c0bf73ab3e9bd0b03", size = 497216, upload-time = "2026-05-10T18:16:38.418Z" }, + { url = "https://files.pythonhosted.org/packages/18/06/094820f91558b66e29943c0ec41c9914f460f48dd51fc503c3101e10842d/librt-0.11.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:32bcc918c0148eb7e3d57385125bac7e5f9e4359d05f07448b09f6f778c2f31c", size = 513921, upload-time = "2026-05-10T18:16:39.848Z" }, + { url = "https://files.pythonhosted.org/packages/0b/c2/00de9018871a282f530cacb457d5ec0428f6ac7e6fedde9aff7468d9fb04/librt-0.11.0-cp314-cp314-musllinux_1_2_i686.whl", hash = "sha256:f9743fc99135d5f78d2454435615f6dec0473ca507c26ce9d92b10b562a280d3", size = 520850, upload-time = "2026-05-10T18:16:41.471Z" }, + { url = "https://files.pythonhosted.org/packages/51/9d/64631832348fd1834fb3a61b996434edddaaf25a31d03b0a76273159d2cf/librt-0.11.0-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:5ba067f4aadae8fda802d91d2124c90c42195ff32d9161d3549e6d05cfe26f96", size = 504237, upload-time = "2026-05-10T18:16:43.15Z" }, + { url = "https://files.pythonhosted.org/packages/a5/ec/ae5525eb16edc827a044e7bb8777a455ff95d4bca9379e7e6bddd7383647/librt-0.11.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:de3bf945454d032f9e390b85c4072e0a0570bf825421c8be0e71209fa65e1abe", size = 546261, upload-time = "2026-05-10T18:16:44.408Z" }, + { url = "https://files.pythonhosted.org/packages/5a/09/adce371f27ca039411da9659f7430fcc2ba6cd0c7b3e4467a0f091be7fa9/librt-0.11.0-cp314-cp314-win32.whl", hash = "sha256:d2277a05f6dcb9fd13db9566aac4fabd68c3ea1ea46ee5567d4eef8efa495a2f", size = 96965, upload-time = "2026-05-10T18:16:46.039Z" }, + { url = "https://files.pythonhosted.org/packages/d6/ee/8ac720d98548f173c7ce2e632a7ca94673f74cacd5c8162a84af5b35958a/librt-0.11.0-cp314-cp314-win_amd64.whl", hash = "sha256:ab73e8db5e3f564d812c1f5c3a175930a5f9bc96ccb5e3b22a34d7858b401cf7", size = 115151, upload-time = "2026-05-10T18:16:47.133Z" }, + { url = "https://files.pythonhosted.org/packages/94/20/c900cf14efeb09b6bef2b2dff20779f73464b97fd58d1c6bccc379588ae3/librt-0.11.0-cp314-cp314-win_arm64.whl", hash = "sha256:aea3caa317752e3a466fa8af45d91ee0ea8c7fdd96e42b0a8dd9b76a7931eba1", size = 98850, upload-time = "2026-05-10T18:16:48.597Z" }, + { url = "https://files.pythonhosted.org/packages/0c/71/944bfe4b64e12abffcd3c15e1cce07f72f3d55655083786285f4dedeb532/librt-0.11.0-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:d1b36540d7aaf9b9101b3a6f376c8d8e9f7a9aec93ed05918f2c69d493ffef72", size = 151138, upload-time = "2026-05-10T18:16:49.839Z" }, + { url = "https://files.pythonhosted.org/packages/b6/10/99e64a5c86989357fda078c8143c533389585f6473b7439172dd8f3b3b2d/librt-0.11.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:efbb343ab2ce3540f4ecbe6315d677ed70f37cd9a72b1e58066c918ca83acbaa", size = 151976, upload-time = "2026-05-10T18:16:51.062Z" }, + { url = "https://files.pythonhosted.org/packages/21/31/5072ad880946d83e5ea4147d6d018c78eefce85b77819b19bdd0ee229435/librt-0.11.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:aa0dd688aab3f7914d3e6e5e3554978e0383312fb8e771d84be008a35b9ee548", size = 557927, upload-time = "2026-05-10T18:16:52.632Z" }, + { url = "https://files.pythonhosted.org/packages/5e/8d/70b5fb7cfbab60edbe7381614ab985da58e144fbf465c86d44c95f43cdca/librt-0.11.0-cp314-cp314t-manylinux2014_i686.manylinux_2_17_i686.manylinux_2_28_i686.whl", hash = "sha256:f5fb36b8c6c63fdcbb1d526d94c0d1331610d43f4118cc1beb4efef4f3faacb2", size = 539698, upload-time = "2026-05-10T18:16:53.934Z" }, + { url = "https://files.pythonhosted.org/packages/fa/a3/ba3495a0b3edbd24a4cae0d1d3c64f39a9fc45d06e812101289b50c1a619/librt-0.11.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4a9a237d13addb93715b6fee74023d5ee3469b53fce527626c0e088aa585805f", size = 577162, upload-time = "2026-05-10T18:16:55.589Z" }, + { url = "https://files.pythonhosted.org/packages/f7/db/36e25fb81f99937ff1b96612a1dc9fd66f039cb9cc3aee12c01fac31aab9/librt-0.11.0-cp314-cp314t-manylinux_2_34_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:5ddd17bd87b2c56ddd60e546a7984a2e64c4e8eab92fb4cf3830a48ad5469d51", size = 566494, upload-time = "2026-05-10T18:16:56.975Z" }, + { url = "https://files.pythonhosted.org/packages/33/0d/3f622b47f0b013eeb9cf4cc07ae9bfe378d832a4eec998b2b209fe84244d/librt-0.11.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:bd43992b4473d42f12ff9e68326079f0696d9d4e6000e8f39a0238d482ba6ee2", size = 596858, upload-time = "2026-05-10T18:16:58.374Z" }, + { url = "https://files.pythonhosted.org/packages/a9/02/71b90bc93039c46a2000651f6ad60122b114c8f54c4ad306e0e96f5b75ad/librt-0.11.0-cp314-cp314t-musllinux_1_2_i686.whl", hash = "sha256:f8e3e8056dd674e279741485e2e512d6e9a751c7455809d0114e6ebf8d781085", size = 590318, upload-time = "2026-05-10T18:16:59.676Z" }, + { url = "https://files.pythonhosted.org/packages/04/04/418cb3f75621e2b761fb1ab0f017f4d70a1a72a6e7c74ee4f7e8d198c2f3/librt-0.11.0-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:c1f708d8ae9c56cf38a903c44297243d2ec83fd82b396b977e0144a3e76217e3", size = 575115, upload-time = "2026-05-10T18:17:01.007Z" }, + { url = "https://files.pythonhosted.org/packages/cc/2c/5a2183ac58dd911f26b5d7e7d7d8f1d87fcecdddd99d6c12169a258ff62c/librt-0.11.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:0add982e0e7b9fc14cf4b33789d5f13f66581889b88c2f58099f6ce8f92617bd", size = 617918, upload-time = "2026-05-10T18:17:02.682Z" }, + { url = "https://files.pythonhosted.org/packages/15/1f/dc6771a52592a4451be6effa200cbfc9cec61e4393d3033d81a9d307961d/librt-0.11.0-cp314-cp314t-win32.whl", hash = "sha256:2b481d846ac894c4e8403c5fd0e87c5d11d6499e404b474602508a224ff531c8", size = 103562, upload-time = "2026-05-10T18:17:03.99Z" }, + { url = "https://files.pythonhosted.org/packages/62/4a/7d1415567027286a75ba1093ec4aca11f073e0f559c530cf3e0a757ad55c/librt-0.11.0-cp314-cp314t-win_amd64.whl", hash = "sha256:28edb433edde181112a908c78907af28f964eabc15f4dd16c9d66c834302677c", size = 124327, upload-time = "2026-05-10T18:17:05.465Z" }, + { url = "https://files.pythonhosted.org/packages/ce/62/b40b382fa0c66fee1478073eb8db352a4a6beda4a1adccf1df911d8c289c/librt-0.11.0-cp314-cp314t-win_arm64.whl", hash = "sha256:dee008f20b542e3cd162ba338a7f9ec0f6d23d395f66fe8aeeec3c9d067ea253", size = 102572, upload-time = "2026-05-10T18:17:06.809Z" }, +] + +[[package]] +name = "mypy" +version = "2.1.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "ast-serialize" }, + { name = "librt", marker = "platform_python_implementation != 'PyPy'" }, + { name = "mypy-extensions" }, + { name = "pathspec" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/82/15/cca9d88503549ed6fedeaa1d448cdddd542ee8a490232d732e278036fbf2/mypy-2.1.0.tar.gz", hash = "sha256:81e76ad12c2d804512e9b13240d1588316531bfba07558286078bfbce9613633", size = 3898359, upload-time = "2026-05-11T18:37:36.237Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0a/a1/639f3024794a2a15899cb90707fe02e044c4412794c39c5769fd3df2e2ef/mypy-2.1.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:a683016b16fe2f572dc04c72be7ee0504ac1605a265d0200f5cea695fb788f41", size = 14691685, upload-time = "2026-05-11T18:33:27.973Z" }, + { url = "https://files.pythonhosted.org/packages/3b/08/9a585dea4325f20d8b80dc78623fa50d1fd2173b710f6237afd6ba6ab39b/mypy-2.1.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:1a293c534adb55271fef24a26da04b855540a8c13cc07bc5917b9fd2c394f2ca", size = 13555165, upload-time = "2026-05-11T18:32:16.107Z" }, + { url = "https://files.pythonhosted.org/packages/81/dc/7c42cc9c6cb01e8eb09961f1f738741d3e9c7e9d5c5b30ec69222625cd5f/mypy-2.1.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7406f4d048e71e576f5356d317e5b0a9e666dfd966bd99f9d14ca06e1a341538", size = 13994376, upload-time = "2026-05-11T18:32:39.256Z" }, + { url = "https://files.pythonhosted.org/packages/d4/fa/285946c33bce716e082c11dfeee9ee196eaf1f5042efb3581a31f9f205e4/mypy-2.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e0210d626fc8b31ccc90233754c7bc90e1f43205e85d96387f7db1285b55c398", size = 14864618, upload-time = "2026-05-11T18:34:49.765Z" }, + { url = "https://files.pythonhosted.org/packages/2b/83/82397f48af6c27e295d57979ded8490c9829040152cf7571b2f026aeb9a0/mypy-2.1.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:3712c20deed54e814eaaa825603bada8ea1c390670a397c95b98405347acc563", size = 15102063, upload-time = "2026-05-11T18:34:05.855Z" }, + { url = "https://files.pythonhosted.org/packages/40/68/b02dec39057b88eb03dc0aa854732e26e8361f34f9d0e20c7614967d1eba/mypy-2.1.0-cp311-cp311-win_amd64.whl", hash = "sha256:fcaa0e479066e31f7cceb6a3bea39cb22b2ff51a6b2f24f193d19179ba17c389", size = 11060564, upload-time = "2026-05-11T18:35:36.494Z" }, + { url = "https://files.pythonhosted.org/packages/cf/a8/ea3dcbef31f99b634f2ee23bb0321cbc8c1b388b76a861eb849f13c347dc/mypy-2.1.0-cp311-cp311-win_arm64.whl", hash = "sha256:0b1a5260c95aa443083f9ed3592662941951bca3d4ca224a5dc517c38b7cf666", size = 9966983, upload-time = "2026-05-11T18:37:14.139Z" }, + { url = "https://files.pythonhosted.org/packages/95/b1/55861beb5c339b44f9a2ba92df9e2cb1eeb4ae1eee674cdf7772c797778b/mypy-2.1.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:244358bf1c0da7722230bce60683d52e8e9fd030554926f15b747a84efb5b3af", size = 14874381, upload-time = "2026-05-11T18:37:31.784Z" }, + { url = "https://files.pythonhosted.org/packages/0b/b3/b7f770114b7d0ac92d0f76e8d93c2780844a70488a90e91821927850da86/mypy-2.1.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4ec7c57657493c7a75534df2751c8ae2cda383c16ecc55d2106c54476b1b16f6", size = 13665501, upload-time = "2026-05-11T18:34:23.063Z" }, + { url = "https://files.pythonhosted.org/packages/b6/f3/8ae2037967e2126689a0c11d99e2b707134a565191e92c60ca2572aec60a/mypy-2.1.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d8161b6ff4392410023224f0969d17db93e1e154bc3e4ba62598e720723ae211", size = 14045750, upload-time = "2026-05-11T18:31:48.151Z" }, + { url = "https://files.pythonhosted.org/packages/a0/32/615eb5911859e43d054941b0d0a7d06cfa2870eba86529cf385b052b111c/mypy-2.1.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bf03e12003084a67395184d3eb8cbd6a489dc3655b5664b28c210a9e2403ab0b", size = 15061630, upload-time = "2026-05-11T18:37:06.898Z" }, + { url = "https://files.pythonhosted.org/packages/d4/03/4eafbfff8bfab1b87082741eae6e6a624028c984e6708b73bce2a8570c9d/mypy-2.1.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:20509760fd791c51579d573153407d226385ec1f8bcce55d730b354f3336bc22", size = 15288831, upload-time = "2026-05-11T18:31:18.07Z" }, + { url = "https://files.pythonhosted.org/packages/99/ee/919661478e5891a3c96e549c036e467e64563ab85995b10c53c8358e16a3/mypy-2.1.0-cp312-cp312-win_amd64.whl", hash = "sha256:6753d0c1fdd6b1a23b9e4f283ce80b2153b724adcb2653b20b85a8a28ac6436b", size = 11135228, upload-time = "2026-05-11T18:34:31.23Z" }, + { url = "https://files.pythonhosted.org/packages/24/0a/6a12b9782ca0831a553192f351679f4548abc9d19a7cc93bb7feb02084c7/mypy-2.1.0-cp312-cp312-win_arm64.whl", hash = "sha256:98ebb6589bb3b6d0c6f0c459d53ca55b8091fbc13d277c4041c885392e8195e8", size = 10040684, upload-time = "2026-05-11T18:36:48.199Z" }, + { url = "https://files.pythonhosted.org/packages/6e/dd/c7191469c777f07689c032a8f7326e393ea34c92d6d76eb7ce5ba57ea66d/mypy-2.1.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:35aac3bb114e03888f535d5eb51b8bafbb3266586b599da1940f9b1be3ec5bd5", size = 14852174, upload-time = "2026-05-11T18:31:38.929Z" }, + { url = "https://files.pythonhosted.org/packages/55/8c/aed55408879043d72bb9135f4d0d19a02b886dd569631e113e3d2706cb8d/mypy-2.1.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:8de55a8c861f2a49331f807be98d90caeceeef520bde13d43a160207f8af613e", size = 13651542, upload-time = "2026-05-11T18:36:04.636Z" }, + { url = "https://files.pythonhosted.org/packages/3a/8e/f371a824b1f1fa8ea6e3dbb8703d232977d572be2329554a3bc4d960302f/mypy-2.1.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5fdf2941a07434af755837d9880f7d7d25f1dacb1af9dcd4b9b66f2220a3024e", size = 14033929, upload-time = "2026-05-11T18:35:55.742Z" }, + { url = "https://files.pythonhosted.org/packages/94/21/f54be870d6dd53a82c674407e0f8eed7174b05ec78d42e5abd7b42e84fd5/mypy-2.1.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e195b817c13f02352a9c124301f9f30f078405444679b6753c1b96b6eed37285", size = 15039200, upload-time = "2026-05-11T18:33:10.281Z" }, + { url = "https://files.pythonhosted.org/packages/17/99/bf21748626a40ce59fd29a39386ab46afec88b7bd2f0fa6c3a97c995523f/mypy-2.1.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:5431d42af987ebd92ba2f71d45c85ed41d8e6ca9f5fd209a69f68f707d2469e5", size = 15272690, upload-time = "2026-05-11T18:32:07.205Z" }, + { url = "https://files.pythonhosted.org/packages/d6/d7/9e90d2cf47100bea550ed2bc7b0d4de3a62181d84d5e37da0003e8462637/mypy-2.1.0-cp313-cp313-win_amd64.whl", hash = "sha256:767fe8c66dc3e01e19e1737d4c38ebefead16125e1b8e58ad421903b376f5c65", size = 11147435, upload-time = "2026-05-11T18:33:56.477Z" }, + { url = "https://files.pythonhosted.org/packages/ec/46/e5c449e858798e35ffc90946282a27c62a77be743fe17480e4977374eb91/mypy-2.1.0-cp313-cp313-win_arm64.whl", hash = "sha256:ecfe70d43775ab99562ab128ce49854a362044c9f894961f68f898c23cb7429d", size = 10035052, upload-time = "2026-05-11T18:32:30.049Z" }, + { url = "https://files.pythonhosted.org/packages/b0/ca/b279a672e874aedd5498ae25f722dacc8aa86bbffb939b3f97cbb1cf6686/mypy-2.1.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:7354c5a7f69d9345c3d6e69921d57088eea3ddeeb6b20d34c1b3855b02c36ec2", size = 14848422, upload-time = "2026-05-11T18:35:45.984Z" }, + { url = "https://files.pythonhosted.org/packages/27/e6/3efe56c631d959b9b4454e208b0ac4b7f4f58b404c89f8bec7b49efdfc21/mypy-2.1.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:49890d4f76ac9e06ec117f9e09f3174da70a620a0c300953d8595c926e80947f", size = 13677374, upload-time = "2026-05-11T18:36:57.188Z" }, + { url = "https://files.pythonhosted.org/packages/84/7f/8107ea87a44fd1f1b59882442f033c9c3488c127201b1d1d15f1cbd6022e/mypy-2.1.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:761be68e023ef5d94678772396a8af1220030f80837a3afd8d0aef3b419666f4", size = 14055743, upload-time = "2026-05-11T18:35:18.361Z" }, + { url = "https://files.pythonhosted.org/packages/51/4d/b6d34db183133b83761b9199a82d31557cdbb70a380d8c3b3438e11882a3/mypy-2.1.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c90345fc182dc363b891350457ec69c35140858538f38b4540845afcc32b1aef", size = 15020937, upload-time = "2026-05-11T18:34:59.618Z" }, + { url = "https://files.pythonhosted.org/packages/ff/d7/f08360c691d758acb02f45022c34d98b92892f4ea756644e1000d4b9f3d8/mypy-2.1.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:b84802e7b5a6daf1f5e15bc9fcd7ddae77be13981ffab037f1c67bb84d67d135", size = 15253371, upload-time = "2026-05-11T18:36:41.081Z" }, + { url = "https://files.pythonhosted.org/packages/67/1b/09460a13719530a19bce27bd3bc8449e83569dd2ba7faf51c9c3c30c0b61/mypy-2.1.0-cp314-cp314-win_amd64.whl", hash = "sha256:022c771234936ceac541ebaf836fe9e2abeb3f5e09aff21588fe543ff006fe21", size = 11326429, upload-time = "2026-05-11T18:34:13.526Z" }, + { url = "https://files.pythonhosted.org/packages/40/62/75dbf0f82f7b6680340efc614af29dd0b3c17b8a4f1cd09b8bd2fd6bc814/mypy-2.1.0-cp314-cp314-win_arm64.whl", hash = "sha256:498207db725cec88829a6a5c2fc771205fd043719ef98bc49aba8fb9fc4e6d57", size = 10218799, upload-time = "2026-05-11T18:32:23.491Z" }, + { url = "https://files.pythonhosted.org/packages/b2/66/caca04ed7d972fb6eb6dd1ccd6df1de5c38fae8c5b3dc1c4e8e0d85ee6b9/mypy-2.1.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:7d5e5cad0efeba72b93cd17490cc0d69c5ac9ca132994fe3fb0314808aeeb83e", size = 15923458, upload-time = "2026-05-11T18:35:28.64Z" }, + { url = "https://files.pythonhosted.org/packages/ed/52/2d90cbe49d014b13ed7ff337930c30bad35893fe38a1e4641e756bb62191/mypy-2.1.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:ff715050c127d724fd260a2e666e7747fdd83511c0c47d449d98238970aef780", size = 14757697, upload-time = "2026-05-11T18:36:14.208Z" }, + { url = "https://files.pythonhosted.org/packages/ac/37/d98f4a14e081b238992d0ed96b6d39c7cc0148c9699eb71eaa68629665ea/mypy-2.1.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:82208da9e09414d520e912d3e462d454854bed0810b71540bb016dcbca7308fd", size = 15405638, upload-time = "2026-05-11T18:33:48.249Z" }, + { url = "https://files.pythonhosted.org/packages/a3/c2/15c46613b24a84fad2aea1248bf9619b99c2767ae9071fe224c179a0b7d4/mypy-2.1.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e79ebc1b904b84f0310dff7469655a9c36c7a68bddb37bdd42b67a332df61d08", size = 16215852, upload-time = "2026-05-11T18:32:50.296Z" }, + { url = "https://files.pythonhosted.org/packages/5c/90/9c16a57f482c76d25f6379762b56bbf65c711d8158cf271fb2802cfb0640/mypy-2.1.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:e583edc957cfb0deb142079162ae826f58449b116c1d442f2d91c69d9fced081", size = 16452695, upload-time = "2026-05-11T18:33:38.182Z" }, + { url = "https://files.pythonhosted.org/packages/0f/4c/215a4eeb63cacc5f17f516691ea7285d11e249802b942476bff15922a314/mypy-2.1.0-cp314-cp314t-win_amd64.whl", hash = "sha256:b33b6cd332695bba180d55e717a79d3038e479a2c49cc5eb3d53603409b9a5d7", size = 12866622, upload-time = "2026-05-11T18:34:39.945Z" }, + { url = "https://files.pythonhosted.org/packages/4b/50/1043e1db5f455ffe4c9ab22747cd8ca2bc492b1e4f4e21b130a44ee2b217/mypy-2.1.0-cp314-cp314t-win_arm64.whl", hash = "sha256:4f910fe825376a7b66ef7ca8c98e5a149e8cd64c19ae71d84047a74ee060d4e6", size = 10610798, upload-time = "2026-05-11T18:36:31.444Z" }, + { url = "https://files.pythonhosted.org/packages/0d/2a/13ca1f292f6db1b98ff495ef3467736b331621c5917cad984b7043e7348d/mypy-2.1.0-py3-none-any.whl", hash = "sha256:a663814603a5c563fb87a4f96fb473eeb30d1f5a4885afcf44f9db000a366289", size = 2693302, upload-time = "2026-05-11T18:31:29.246Z" }, +] + +[[package]] +name = "mypy-extensions" +version = "1.1.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/a2/6e/371856a3fb9d31ca8dac321cda606860fa4548858c0cc45d9d1d4ca2628b/mypy_extensions-1.1.0.tar.gz", hash = "sha256:52e68efc3284861e772bbcd66823fde5ae21fd2fdb51c62a211403730b916558", size = 6343, upload-time = "2025-04-22T14:54:24.164Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/79/7b/2c79738432f5c924bef5071f933bcc9efd0473bac3b4aa584a6f7c1c8df8/mypy_extensions-1.1.0-py3-none-any.whl", hash = "sha256:1be4cccdb0f2482337c4743e60421de3a356cd97508abadd57d47403e94f5505", size = 4963, upload-time = "2025-04-22T14:54:22.983Z" }, +] + +[[package]] +name = "nodeenv" +version = "1.10.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/24/bf/d1bda4f6168e0b2e9e5958945e01910052158313224ada5ce1fb2e1113b8/nodeenv-1.10.0.tar.gz", hash = "sha256:996c191ad80897d076bdfba80a41994c2b47c68e224c542b48feba42ba00f8bb", size = 55611, upload-time = "2025-12-20T14:08:54.006Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/b2/d0896bdcdc8d28a7fc5717c305f1a861c26e18c05047949fb371034d98bd/nodeenv-1.10.0-py2.py3-none-any.whl", hash = "sha256:5bb13e3eed2923615535339b3c620e76779af4cb4c6a90deccc9e36b274d3827", size = 23438, upload-time = "2025-12-20T14:08:52.782Z" }, +] + +[[package]] +name = "packaging" +version = "26.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d7/f1/e7a6dd94a8d4a5626c03e4e99c87f241ba9e350cd9e6d75123f992427270/packaging-26.2.tar.gz", hash = "sha256:ff452ff5a3e828ce110190feff1178bb1f2ea2281fa2075aadb987c2fb221661", size = 228134, upload-time = "2026-04-24T20:15:23.917Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/df/b2/87e62e8c3e2f4b32e5fe99e0b86d576da1312593b39f47d8ceef365e95ed/packaging-26.2-py3-none-any.whl", hash = "sha256:5fc45236b9446107ff2415ce77c807cee2862cb6fac22b8a73826d0693b0980e", size = 100195, upload-time = "2026-04-24T20:15:22.081Z" }, +] + +[[package]] +name = "pathspec" +version = "1.1.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/5a/82/42f767fc1c1143d6fd36efb827202a2d997a375e160a71eb2888a925aac1/pathspec-1.1.1.tar.gz", hash = "sha256:17db5ecd524104a120e173814c90367a96a98d07c45b2e10c2f3919fff91bf5a", size = 135180, upload-time = "2026-04-27T01:46:08.907Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/f1/d9/7fb5aa316bc299258e68c73ba3bddbc499654a07f151cba08f6153988714/pathspec-1.1.1-py3-none-any.whl", hash = "sha256:a00ce642f577bf7f473932318056212bc4f8bfdf53128c78bbd5af0b9b20b189", size = 57328, upload-time = "2026-04-27T01:46:07.06Z" }, +] + +[[package]] +name = "platformdirs" +version = "4.10.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d7/47/e4501f49c178ae1d9f4a75073fda4204f52647993f075a9db4d14930e0c5/platformdirs-4.10.0.tar.gz", hash = "sha256:31e761a6a0ca04faf7353ea759bdba55652be214725111e5aac52dfa29d4bef7", size = 31224, upload-time = "2026-05-28T03:32:53.587Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/81/e6/cd9575ac904136b3cbf7aa7ee819ef86eedb7274e46f230e94ea4342e729/platformdirs-4.10.0-py3-none-any.whl", hash = "sha256:fb516cdb12eb0d857d0cd85a7c57cea4d060bee4578d6cf5a14dfdf8cbf8784a", size = 22743, upload-time = "2026-05-28T03:32:52.175Z" }, +] + +[[package]] +name = "pluggy" +version = "1.6.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/f9/e2/3e91f31a7d2b083fe6ef3fa267035b518369d9511ffab804f839851d2779/pluggy-1.6.0.tar.gz", hash = "sha256:7dcc130b76258d33b90f61b658791dede3486c3e6bfb003ee5c9bfb396dd22f3", size = 69412, upload-time = "2025-05-15T12:30:07.975Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" }, +] + +[[package]] +name = "pre-commit" +version = "4.6.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "cfgv" }, + { name = "identify" }, + { name = "nodeenv" }, + { name = "pyyaml" }, + { name = "virtualenv" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/8e/22/2de9408ac81acbb8a7d05d4cc064a152ccf33b3d480ebe0cd292153db239/pre_commit-4.6.0.tar.gz", hash = "sha256:718d2208cef53fdc38206e40524a6d4d9576d103eb16f0fec11c875e7716e9d9", size = 198525, upload-time = "2026-04-21T20:31:41.613Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/80/6e/4b28b62ecb6aae56769c34a8ff1d661473ec1e9519e2d5f8b2c150086b26/pre_commit-4.6.0-py2.py3-none-any.whl", hash = "sha256:e2cf246f7299edcabcf15f9b0571fdce06058527f0a06535068a86d38089f29b", size = 226472, upload-time = "2026-04-21T20:31:40.092Z" }, +] + +[[package]] +name = "pygments" +version = "2.20.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/c3/b2/bc9c9196916376152d655522fdcebac55e66de6603a76a02bca1b6414f6c/pygments-2.20.0.tar.gz", hash = "sha256:6757cd03768053ff99f3039c1a36d6c0aa0b263438fcab17520b30a303a82b5f", size = 4955991, upload-time = "2026-03-29T13:29:33.898Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/f4/7e/a72dd26f3b0f4f2bf1dd8923c85f7ceb43172af56d63c7383eb62b332364/pygments-2.20.0-py3-none-any.whl", hash = "sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176", size = 1231151, upload-time = "2026-03-29T13:29:30.038Z" }, +] + +[[package]] +name = "pyright" +version = "1.1.409" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "nodeenv" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/51/4e/3aa27f74211522dba7e9cbc3e74de779c6d4b654c54e50a4840623be8014/pyright-1.1.409.tar.gz", hash = "sha256:986ee05beca9e077c165758ad123667c679e050059a2546aa02473930394bc93", size = 4430434, upload-time = "2026-04-23T11:02:03.799Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/16/6b/330d8ebae582b30c2959a1ef4c3bc344ebde48c2ff0c3f113c4710735e11/pyright-1.1.409-py3-none-any.whl", hash = "sha256:aa3ea228cab90c845c7a60d28db7a844c04315356392aa09fafcee98c8c22fb3", size = 6438161, upload-time = "2026-04-23T11:02:01.309Z" }, +] + +[[package]] +name = "pytest" +version = "9.0.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama", marker = "sys_platform == 'win32'" }, + { name = "iniconfig" }, + { name = "packaging" }, + { name = "pluggy" }, + { name = "pygments" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/7d/0d/549bd94f1a0a402dc8cf64563a117c0f3765662e2e668477624baeec44d5/pytest-9.0.3.tar.gz", hash = "sha256:b86ada508af81d19edeb213c681b1d48246c1a91d304c6c81a427674c17eb91c", size = 1572165, upload-time = "2026-04-07T17:16:18.027Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d4/24/a372aaf5c9b7208e7112038812994107bc65a84cd00e0354a88c2c77a617/pytest-9.0.3-py3-none-any.whl", hash = "sha256:2c5efc453d45394fdd706ade797c0a81091eccd1d6e4bccfcd476e2b8e0ab5d9", size = 375249, upload-time = "2026-04-07T17:16:16.13Z" }, +] + +[[package]] +name = "pytest-cov" +version = "7.1.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "coverage", extra = ["toml"] }, + { name = "pluggy" }, + { name = "pytest" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/b1/51/a849f96e117386044471c8ec2bd6cfebacda285da9525c9106aeb28da671/pytest_cov-7.1.0.tar.gz", hash = "sha256:30674f2b5f6351aa09702a9c8c364f6a01c27aae0c1366ae8016160d1efc56b2", size = 55592, upload-time = "2026-03-21T20:11:16.284Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/9d/7a/d968e294073affff457b041c2be9868a40c1c71f4a35fcc1e45e5493067b/pytest_cov-7.1.0-py3-none-any.whl", hash = "sha256:a0461110b7865f9a271aa1b51e516c9a95de9d696734a2f71e3e78f46e1d4678", size = 22876, upload-time = "2026-03-21T20:11:14.438Z" }, +] + +[[package]] +name = "python-discovery" +version = "1.4.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "filelock" }, + { name = "platformdirs" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/a6/12/38c1a0b1e64806780c9563e3fc9f6e472251839662587cfbe9bfaf2ae10a/python_discovery-1.4.0.tar.gz", hash = "sha256:eb8bc7daad3c226c147e45bb4e970a1feb1bf4048ee178e6db59e197b8010ce3", size = 68455, upload-time = "2026-05-28T01:15:37.639Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c8/8d/3d316429f65029532bb1e28ff77b797d86b5ac3915bb44ca4e19aa283d43/python_discovery-1.4.0-py3-none-any.whl", hash = "sha256:26ed78d703e234879a66244c7d4114563fb13ec5cd30a2d1357e5fb4850782da", size = 33217, upload-time = "2026-05-28T01:15:36.573Z" }, +] + +[[package]] +name = "pyyaml" +version = "6.0.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/05/8e/961c0007c59b8dd7729d542c61a4d537767a59645b82a0b521206e1e25c2/pyyaml-6.0.3.tar.gz", hash = "sha256:d76623373421df22fb4cf8817020cbb7ef15c725b9d5e45f17e189bfc384190f", size = 130960, upload-time = "2025-09-25T21:33:16.546Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/6d/16/a95b6757765b7b031c9374925bb718d55e0a9ba8a1b6a12d25962ea44347/pyyaml-6.0.3-cp311-cp311-macosx_10_13_x86_64.whl", hash = "sha256:44edc647873928551a01e7a563d7452ccdebee747728c1080d881d68af7b997e", size = 185826, upload-time = "2025-09-25T21:31:58.655Z" }, + { url = "https://files.pythonhosted.org/packages/16/19/13de8e4377ed53079ee996e1ab0a9c33ec2faf808a4647b7b4c0d46dd239/pyyaml-6.0.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:652cb6edd41e718550aad172851962662ff2681490a8a711af6a4d288dd96824", size = 175577, upload-time = "2025-09-25T21:32:00.088Z" }, + { url = "https://files.pythonhosted.org/packages/0c/62/d2eb46264d4b157dae1275b573017abec435397aa59cbcdab6fc978a8af4/pyyaml-6.0.3-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:10892704fc220243f5305762e276552a0395f7beb4dbf9b14ec8fd43b57f126c", size = 775556, upload-time = "2025-09-25T21:32:01.31Z" }, + { url = "https://files.pythonhosted.org/packages/10/cb/16c3f2cf3266edd25aaa00d6c4350381c8b012ed6f5276675b9eba8d9ff4/pyyaml-6.0.3-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:850774a7879607d3a6f50d36d04f00ee69e7fc816450e5f7e58d7f17f1ae5c00", size = 882114, upload-time = "2025-09-25T21:32:03.376Z" }, + { url = "https://files.pythonhosted.org/packages/71/60/917329f640924b18ff085ab889a11c763e0b573da888e8404ff486657602/pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b8bb0864c5a28024fac8a632c443c87c5aa6f215c0b126c449ae1a150412f31d", size = 806638, upload-time = "2025-09-25T21:32:04.553Z" }, + { url = "https://files.pythonhosted.org/packages/dd/6f/529b0f316a9fd167281a6c3826b5583e6192dba792dd55e3203d3f8e655a/pyyaml-6.0.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:1d37d57ad971609cf3c53ba6a7e365e40660e3be0e5175fa9f2365a379d6095a", size = 767463, upload-time = "2025-09-25T21:32:06.152Z" }, + { url = "https://files.pythonhosted.org/packages/f2/6a/b627b4e0c1dd03718543519ffb2f1deea4a1e6d42fbab8021936a4d22589/pyyaml-6.0.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:37503bfbfc9d2c40b344d06b2199cf0e96e97957ab1c1b546fd4f87e53e5d3e4", size = 794986, upload-time = "2025-09-25T21:32:07.367Z" }, + { url = "https://files.pythonhosted.org/packages/45/91/47a6e1c42d9ee337c4839208f30d9f09caa9f720ec7582917b264defc875/pyyaml-6.0.3-cp311-cp311-win32.whl", hash = "sha256:8098f252adfa6c80ab48096053f512f2321f0b998f98150cea9bd23d83e1467b", size = 142543, upload-time = "2025-09-25T21:32:08.95Z" }, + { url = "https://files.pythonhosted.org/packages/da/e3/ea007450a105ae919a72393cb06f122f288ef60bba2dc64b26e2646fa315/pyyaml-6.0.3-cp311-cp311-win_amd64.whl", hash = "sha256:9f3bfb4965eb874431221a3ff3fdcddc7e74e3b07799e0e84ca4a0f867d449bf", size = 158763, upload-time = "2025-09-25T21:32:09.96Z" }, + { url = "https://files.pythonhosted.org/packages/d1/33/422b98d2195232ca1826284a76852ad5a86fe23e31b009c9886b2d0fb8b2/pyyaml-6.0.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:7f047e29dcae44602496db43be01ad42fc6f1cc0d8cd6c83d342306c32270196", size = 182063, upload-time = "2025-09-25T21:32:11.445Z" }, + { url = "https://files.pythonhosted.org/packages/89/a0/6cf41a19a1f2f3feab0e9c0b74134aa2ce6849093d5517a0c550fe37a648/pyyaml-6.0.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:fc09d0aa354569bc501d4e787133afc08552722d3ab34836a80547331bb5d4a0", size = 173973, upload-time = "2025-09-25T21:32:12.492Z" }, + { url = "https://files.pythonhosted.org/packages/ed/23/7a778b6bd0b9a8039df8b1b1d80e2e2ad78aa04171592c8a5c43a56a6af4/pyyaml-6.0.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9149cad251584d5fb4981be1ecde53a1ca46c891a79788c0df828d2f166bda28", size = 775116, upload-time = "2025-09-25T21:32:13.652Z" }, + { url = "https://files.pythonhosted.org/packages/65/30/d7353c338e12baef4ecc1b09e877c1970bd3382789c159b4f89d6a70dc09/pyyaml-6.0.3-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5fdec68f91a0c6739b380c83b951e2c72ac0197ace422360e6d5a959d8d97b2c", size = 844011, upload-time = "2025-09-25T21:32:15.21Z" }, + { url = "https://files.pythonhosted.org/packages/8b/9d/b3589d3877982d4f2329302ef98a8026e7f4443c765c46cfecc8858c6b4b/pyyaml-6.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ba1cc08a7ccde2d2ec775841541641e4548226580ab850948cbfda66a1befcdc", size = 807870, upload-time = "2025-09-25T21:32:16.431Z" }, + { url = "https://files.pythonhosted.org/packages/05/c0/b3be26a015601b822b97d9149ff8cb5ead58c66f981e04fedf4e762f4bd4/pyyaml-6.0.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:8dc52c23056b9ddd46818a57b78404882310fb473d63f17b07d5c40421e47f8e", size = 761089, upload-time = "2025-09-25T21:32:17.56Z" }, + { url = "https://files.pythonhosted.org/packages/be/8e/98435a21d1d4b46590d5459a22d88128103f8da4c2d4cb8f14f2a96504e1/pyyaml-6.0.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:41715c910c881bc081f1e8872880d3c650acf13dfa8214bad49ed4cede7c34ea", size = 790181, upload-time = "2025-09-25T21:32:18.834Z" }, + { url = "https://files.pythonhosted.org/packages/74/93/7baea19427dcfbe1e5a372d81473250b379f04b1bd3c4c5ff825e2327202/pyyaml-6.0.3-cp312-cp312-win32.whl", hash = "sha256:96b533f0e99f6579b3d4d4995707cf36df9100d67e0c8303a0c55b27b5f99bc5", size = 137658, upload-time = "2025-09-25T21:32:20.209Z" }, + { url = "https://files.pythonhosted.org/packages/86/bf/899e81e4cce32febab4fb42bb97dcdf66bc135272882d1987881a4b519e9/pyyaml-6.0.3-cp312-cp312-win_amd64.whl", hash = "sha256:5fcd34e47f6e0b794d17de1b4ff496c00986e1c83f7ab2fb8fcfe9616ff7477b", size = 154003, upload-time = "2025-09-25T21:32:21.167Z" }, + { url = "https://files.pythonhosted.org/packages/1a/08/67bd04656199bbb51dbed1439b7f27601dfb576fb864099c7ef0c3e55531/pyyaml-6.0.3-cp312-cp312-win_arm64.whl", hash = "sha256:64386e5e707d03a7e172c0701abfb7e10f0fb753ee1d773128192742712a98fd", size = 140344, upload-time = "2025-09-25T21:32:22.617Z" }, + { url = "https://files.pythonhosted.org/packages/d1/11/0fd08f8192109f7169db964b5707a2f1e8b745d4e239b784a5a1dd80d1db/pyyaml-6.0.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:8da9669d359f02c0b91ccc01cac4a67f16afec0dac22c2ad09f46bee0697eba8", size = 181669, upload-time = "2025-09-25T21:32:23.673Z" }, + { url = "https://files.pythonhosted.org/packages/b1/16/95309993f1d3748cd644e02e38b75d50cbc0d9561d21f390a76242ce073f/pyyaml-6.0.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:2283a07e2c21a2aa78d9c4442724ec1eb15f5e42a723b99cb3d822d48f5f7ad1", size = 173252, upload-time = "2025-09-25T21:32:25.149Z" }, + { url = "https://files.pythonhosted.org/packages/50/31/b20f376d3f810b9b2371e72ef5adb33879b25edb7a6d072cb7ca0c486398/pyyaml-6.0.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ee2922902c45ae8ccada2c5b501ab86c36525b883eff4255313a253a3160861c", size = 767081, upload-time = "2025-09-25T21:32:26.575Z" }, + { url = "https://files.pythonhosted.org/packages/49/1e/a55ca81e949270d5d4432fbbd19dfea5321eda7c41a849d443dc92fd1ff7/pyyaml-6.0.3-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:a33284e20b78bd4a18c8c2282d549d10bc8408a2a7ff57653c0cf0b9be0afce5", size = 841159, upload-time = "2025-09-25T21:32:27.727Z" }, + { url = "https://files.pythonhosted.org/packages/74/27/e5b8f34d02d9995b80abcef563ea1f8b56d20134d8f4e5e81733b1feceb2/pyyaml-6.0.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0f29edc409a6392443abf94b9cf89ce99889a1dd5376d94316ae5145dfedd5d6", size = 801626, upload-time = "2025-09-25T21:32:28.878Z" }, + { url = "https://files.pythonhosted.org/packages/f9/11/ba845c23988798f40e52ba45f34849aa8a1f2d4af4b798588010792ebad6/pyyaml-6.0.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f7057c9a337546edc7973c0d3ba84ddcdf0daa14533c2065749c9075001090e6", size = 753613, upload-time = "2025-09-25T21:32:30.178Z" }, + { url = "https://files.pythonhosted.org/packages/3d/e0/7966e1a7bfc0a45bf0a7fb6b98ea03fc9b8d84fa7f2229e9659680b69ee3/pyyaml-6.0.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:eda16858a3cab07b80edaf74336ece1f986ba330fdb8ee0d6c0d68fe82bc96be", size = 794115, upload-time = "2025-09-25T21:32:31.353Z" }, + { url = "https://files.pythonhosted.org/packages/de/94/980b50a6531b3019e45ddeada0626d45fa85cbe22300844a7983285bed3b/pyyaml-6.0.3-cp313-cp313-win32.whl", hash = "sha256:d0eae10f8159e8fdad514efdc92d74fd8d682c933a6dd088030f3834bc8e6b26", size = 137427, upload-time = "2025-09-25T21:32:32.58Z" }, + { url = "https://files.pythonhosted.org/packages/97/c9/39d5b874e8b28845e4ec2202b5da735d0199dbe5b8fb85f91398814a9a46/pyyaml-6.0.3-cp313-cp313-win_amd64.whl", hash = "sha256:79005a0d97d5ddabfeeea4cf676af11e647e41d81c9a7722a193022accdb6b7c", size = 154090, upload-time = "2025-09-25T21:32:33.659Z" }, + { url = "https://files.pythonhosted.org/packages/73/e8/2bdf3ca2090f68bb3d75b44da7bbc71843b19c9f2b9cb9b0f4ab7a5a4329/pyyaml-6.0.3-cp313-cp313-win_arm64.whl", hash = "sha256:5498cd1645aa724a7c71c8f378eb29ebe23da2fc0d7a08071d89469bf1d2defb", size = 140246, upload-time = "2025-09-25T21:32:34.663Z" }, + { url = "https://files.pythonhosted.org/packages/9d/8c/f4bd7f6465179953d3ac9bc44ac1a8a3e6122cf8ada906b4f96c60172d43/pyyaml-6.0.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:8d1fab6bb153a416f9aeb4b8763bc0f22a5586065f86f7664fc23339fc1c1fac", size = 181814, upload-time = "2025-09-25T21:32:35.712Z" }, + { url = "https://files.pythonhosted.org/packages/bd/9c/4d95bb87eb2063d20db7b60faa3840c1b18025517ae857371c4dd55a6b3a/pyyaml-6.0.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:34d5fcd24b8445fadc33f9cf348c1047101756fd760b4dacb5c3e99755703310", size = 173809, upload-time = "2025-09-25T21:32:36.789Z" }, + { url = "https://files.pythonhosted.org/packages/92/b5/47e807c2623074914e29dabd16cbbdd4bf5e9b2db9f8090fa64411fc5382/pyyaml-6.0.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:501a031947e3a9025ed4405a168e6ef5ae3126c59f90ce0cd6f2bfc477be31b7", size = 766454, upload-time = "2025-09-25T21:32:37.966Z" }, + { url = "https://files.pythonhosted.org/packages/02/9e/e5e9b168be58564121efb3de6859c452fccde0ab093d8438905899a3a483/pyyaml-6.0.3-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:b3bc83488de33889877a0f2543ade9f70c67d66d9ebb4ac959502e12de895788", size = 836355, upload-time = "2025-09-25T21:32:39.178Z" }, + { url = "https://files.pythonhosted.org/packages/88/f9/16491d7ed2a919954993e48aa941b200f38040928474c9e85ea9e64222c3/pyyaml-6.0.3-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c458b6d084f9b935061bc36216e8a69a7e293a2f1e68bf956dcd9e6cbcd143f5", size = 794175, upload-time = "2025-09-25T21:32:40.865Z" }, + { url = "https://files.pythonhosted.org/packages/dd/3f/5989debef34dc6397317802b527dbbafb2b4760878a53d4166579111411e/pyyaml-6.0.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:7c6610def4f163542a622a73fb39f534f8c101d690126992300bf3207eab9764", size = 755228, upload-time = "2025-09-25T21:32:42.084Z" }, + { url = "https://files.pythonhosted.org/packages/d7/ce/af88a49043cd2e265be63d083fc75b27b6ed062f5f9fd6cdc223ad62f03e/pyyaml-6.0.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:5190d403f121660ce8d1d2c1bb2ef1bd05b5f68533fc5c2ea899bd15f4399b35", size = 789194, upload-time = "2025-09-25T21:32:43.362Z" }, + { url = "https://files.pythonhosted.org/packages/23/20/bb6982b26a40bb43951265ba29d4c246ef0ff59c9fdcdf0ed04e0687de4d/pyyaml-6.0.3-cp314-cp314-win_amd64.whl", hash = "sha256:4a2e8cebe2ff6ab7d1050ecd59c25d4c8bd7e6f400f5f82b96557ac0abafd0ac", size = 156429, upload-time = "2025-09-25T21:32:57.844Z" }, + { url = "https://files.pythonhosted.org/packages/f4/f4/a4541072bb9422c8a883ab55255f918fa378ecf083f5b85e87fc2b4eda1b/pyyaml-6.0.3-cp314-cp314-win_arm64.whl", hash = "sha256:93dda82c9c22deb0a405ea4dc5f2d0cda384168e466364dec6255b293923b2f3", size = 143912, upload-time = "2025-09-25T21:32:59.247Z" }, + { url = "https://files.pythonhosted.org/packages/7c/f9/07dd09ae774e4616edf6cda684ee78f97777bdd15847253637a6f052a62f/pyyaml-6.0.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:02893d100e99e03eda1c8fd5c441d8c60103fd175728e23e431db1b589cf5ab3", size = 189108, upload-time = "2025-09-25T21:32:44.377Z" }, + { url = "https://files.pythonhosted.org/packages/4e/78/8d08c9fb7ce09ad8c38ad533c1191cf27f7ae1effe5bb9400a46d9437fcf/pyyaml-6.0.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:c1ff362665ae507275af2853520967820d9124984e0f7466736aea23d8611fba", size = 183641, upload-time = "2025-09-25T21:32:45.407Z" }, + { url = "https://files.pythonhosted.org/packages/7b/5b/3babb19104a46945cf816d047db2788bcaf8c94527a805610b0289a01c6b/pyyaml-6.0.3-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6adc77889b628398debc7b65c073bcb99c4a0237b248cacaf3fe8a557563ef6c", size = 831901, upload-time = "2025-09-25T21:32:48.83Z" }, + { url = "https://files.pythonhosted.org/packages/8b/cc/dff0684d8dc44da4d22a13f35f073d558c268780ce3c6ba1b87055bb0b87/pyyaml-6.0.3-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:a80cb027f6b349846a3bf6d73b5e95e782175e52f22108cfa17876aaeff93702", size = 861132, upload-time = "2025-09-25T21:32:50.149Z" }, + { url = "https://files.pythonhosted.org/packages/b1/5e/f77dc6b9036943e285ba76b49e118d9ea929885becb0a29ba8a7c75e29fe/pyyaml-6.0.3-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:00c4bdeba853cc34e7dd471f16b4114f4162dc03e6b7afcc2128711f0eca823c", size = 839261, upload-time = "2025-09-25T21:32:51.808Z" }, + { url = "https://files.pythonhosted.org/packages/ce/88/a9db1376aa2a228197c58b37302f284b5617f56a5d959fd1763fb1675ce6/pyyaml-6.0.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:66e1674c3ef6f541c35191caae2d429b967b99e02040f5ba928632d9a7f0f065", size = 805272, upload-time = "2025-09-25T21:32:52.941Z" }, + { url = "https://files.pythonhosted.org/packages/da/92/1446574745d74df0c92e6aa4a7b0b3130706a4142b2d1a5869f2eaa423c6/pyyaml-6.0.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:16249ee61e95f858e83976573de0f5b2893b3677ba71c9dd36b9cf8be9ac6d65", size = 829923, upload-time = "2025-09-25T21:32:54.537Z" }, + { url = "https://files.pythonhosted.org/packages/f0/7a/1c7270340330e575b92f397352af856a8c06f230aa3e76f86b39d01b416a/pyyaml-6.0.3-cp314-cp314t-win_amd64.whl", hash = "sha256:4ad1906908f2f5ae4e5a8ddfce73c320c2a1429ec52eafd27138b7f1cbe341c9", size = 174062, upload-time = "2025-09-25T21:32:55.767Z" }, + { url = "https://files.pythonhosted.org/packages/f1/12/de94a39c2ef588c7e6455cfbe7343d3b2dc9d6b6b2f40c4c6565744c873d/pyyaml-6.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:ebc55a14a21cb14062aa4162f906cd962b28e2e9ea38f9b4391244cd8de4ae0b", size = 149341, upload-time = "2025-09-25T21:32:56.828Z" }, +] + +[[package]] +name = "ruff" +version = "0.15.15" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/84/6f/a76f7d96e5c962f5b69cee865e49c15c1116897c01990faa8a57edb62e7f/ruff-0.15.15.tar.gz", hash = "sha256:b8dff018130b46d8e5bf0f926ef6b60cf871d6d5ae45fc9334e09632daa741d6", size = 4706985, upload-time = "2026-05-28T14:16:57.784Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/fa/9d/3a45c05b8ab04b4705989de70a79008e27c8003296a0feaee9edc18dd7e9/ruff-0.15.15-py3-none-linux_armv6l.whl", hash = "sha256:cf93e5388f412e1b108b1f8b34a6e036b70fe8aff89393befad96fe48670311b", size = 10710652, upload-time = "2026-05-28T14:16:06.701Z" }, + { url = "https://files.pythonhosted.org/packages/05/66/da974431624bf3b49f6ee1f9543c02d929ff1cba78b0d5a79c38cf21f744/ruff-0.15.15-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:ac5a646d1f6a7dadd5d50842dae2c1f9862ac887ef5d1b1375e02def791fde6e", size = 11096615, upload-time = "2026-05-28T14:16:23.313Z" }, + { url = "https://files.pythonhosted.org/packages/8c/09/7443452e5d290230a712103f2fdceeef7184f3ec99a2bd01c8be78aaceb5/ruff-0.15.15-py3-none-macosx_11_0_arm64.whl", hash = "sha256:77d955a431430c66f72dd94e379ad38a16daea3d25094872ac4edf9e797be530", size = 10436683, upload-time = "2026-05-28T14:16:40.974Z" }, + { url = "https://files.pythonhosted.org/packages/53/01/d330c26a57fa4f3943a14424904027428315b700fe4d14a84bb123a649e5/ruff-0.15.15-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7614ee79c69788cf6cedd568069ade9cecc22a1ad20494efe8d0c9ebb4b622d4", size = 10769064, upload-time = "2026-05-28T14:16:28.905Z" }, + { url = "https://files.pythonhosted.org/packages/1d/85/cc8770f8bdff541b1da8392d1634141fe4a0e3f4ee596605959b7906c27f/ruff-0.15.15-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:3cdb1679e06a1f6b47bc384714ae96f6e2fb65ca441eb78c43d2ca554176ce1f", size = 10511987, upload-time = "2026-05-28T14:16:43.732Z" }, + { url = "https://files.pythonhosted.org/packages/7c/29/8c190c1472b63013583ba391f3342036e02010544c1270455ed8e519bdf3/ruff-0.15.15-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:2728b93d7b23a603ea2c0ac6eb73d760bd38ec9de35f35fb41e18f7a3fee7622", size = 11275100, upload-time = "2026-05-28T14:16:55.244Z" }, + { url = "https://files.pythonhosted.org/packages/9f/6b/7e145ce2cc8e63d6834eca03d83a0e18d121def5c69f91b4cf4011ed4879/ruff-0.15.15-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:be582fcc0db438902c7792b08d6ddf6c9b9e21addaa10092c2c741cfb09e5a45", size = 12176903, upload-time = "2026-05-28T14:16:14.368Z" }, + { url = "https://files.pythonhosted.org/packages/80/a3/d5974637f68e451f7fadf015cf3101d1cd7d8ba5027cffe0b9e3826ebe6b/ruff-0.15.15-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:7aa77465b8ecaf1a27bea098d696f7fed5e1eccbd10b321b682d6de586ae5627", size = 11404550, upload-time = "2026-05-28T14:16:20.138Z" }, + { url = "https://files.pythonhosted.org/packages/fe/1c/e6e5e568f22be4fb05d6244234aba384c06b451252453b821e1a529263cf/ruff-0.15.15-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:48decfa11d740de4889de623be1463308346312f2409a56e24aa280c86162dc4", size = 11382027, upload-time = "2026-05-28T14:16:46.615Z" }, + { url = "https://files.pythonhosted.org/packages/1d/01/170921b49fcd2e8858825593f91cf7146c3e40a5c3e6df763e4bb0484dde/ruff-0.15.15-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:a5015088452ca0081387063649ec67f06d3d1d6b8b936a1f836b5e9657ecd48c", size = 11366041, upload-time = "2026-05-28T14:16:26.247Z" }, + { url = "https://files.pythonhosted.org/packages/87/54/a7bad711d7de93254e15e06a4c375b89a03d18de45d3e5dcc86a4472fb1a/ruff-0.15.15-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:f5294aab6356c81600fcdea3a62bb1b924dfd5e91767c12318d3f68f86af57cd", size = 10741795, upload-time = "2026-05-28T14:16:17.11Z" }, + { url = "https://files.pythonhosted.org/packages/c9/31/38c075963668f8b41c6914ee0f6f318727fbe30ab9145cb29e6df464c5fa/ruff-0.15.15-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:db5bd4d802415cca656dc1616070b725952d6ae95eb5d4831e49fbd94a38f75f", size = 10511117, upload-time = "2026-05-28T14:16:31.767Z" }, + { url = "https://files.pythonhosted.org/packages/9d/96/6ff689e1f7e375d1d97075eca022f74c2bab59554a432fe4d2e6f091986a/ruff-0.15.15-py3-none-musllinux_1_2_i686.whl", hash = "sha256:587a6278ed42059191c1a466e490bd7930fb50bd2e255398bc29616c895a61cb", size = 10994867, upload-time = "2026-05-28T14:16:35.149Z" }, + { url = "https://files.pythonhosted.org/packages/c3/c2/5dce0ab9f92a8d534fa62b9bf9caca3eddb8c1a81b616f5e195ada4f0d6e/ruff-0.15.15-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:df0c1c084f5f4be9812f61518a45c440d3c30d69ce4bf6c5270e66d38338f02a", size = 11482101, upload-time = "2026-05-28T14:16:49.598Z" }, + { url = "https://files.pythonhosted.org/packages/b1/c0/1003b60edd697c649faf61f1a34094b1abb38fb3d1181e3f895781250a08/ruff-0.15.15-py3-none-win32.whl", hash = "sha256:29428ea79694afbe756d45fd59b36f22b6b020dc0443cf7de0173046236964b9", size = 10716774, upload-time = "2026-05-28T14:16:52.337Z" }, + { url = "https://files.pythonhosted.org/packages/02/a8/1269eddd6945a06c23f055ef7848886e37cf9d6a8bebb386a3115f01470c/ruff-0.15.15-py3-none-win_amd64.whl", hash = "sha256:8df0323902e15e24bc4bf246da830573d3cf3352bd0b9a164eab335d111ff4a4", size = 11868463, upload-time = "2026-05-28T14:16:11.333Z" }, + { url = "https://files.pythonhosted.org/packages/4e/b2/920464c907b191e37469d477a1aa8bc048b8f36c4c1610dfa4ab87b39e18/ruff-0.15.15-py3-none-win_arm64.whl", hash = "sha256:3c8ceca6792f38196b8f589bc92eccd03eef286602da92e5dc05cc42ef6441b7", size = 11138498, upload-time = "2026-05-28T14:16:38.425Z" }, +] + +[[package]] +name = "tomli" +version = "2.4.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/22/de/48c59722572767841493b26183a0d1cc411d54fd759c5607c4590b6563a6/tomli-2.4.1.tar.gz", hash = "sha256:7c7e1a961a0b2f2472c1ac5b69affa0ae1132c39adcb67aba98568702b9cc23f", size = 17543, upload-time = "2026-03-25T20:22:03.828Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/f4/11/db3d5885d8528263d8adc260bb2d28ebf1270b96e98f0e0268d32b8d9900/tomli-2.4.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:f8f0fc26ec2cc2b965b7a3b87cd19c5c6b8c5e5f436b984e85f486d652285c30", size = 154704, upload-time = "2026-03-25T20:21:10.473Z" }, + { url = "https://files.pythonhosted.org/packages/6d/f7/675db52c7e46064a9aa928885a9b20f4124ecb9bc2e1ce74c9106648d202/tomli-2.4.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:4ab97e64ccda8756376892c53a72bd1f964e519c77236368527f758fbc36a53a", size = 149454, upload-time = "2026-03-25T20:21:12.036Z" }, + { url = "https://files.pythonhosted.org/packages/61/71/81c50943cf953efa35bce7646caab3cf457a7d8c030b27cfb40d7235f9ee/tomli-2.4.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:96481a5786729fd470164b47cdb3e0e58062a496f455ee41b4403be77cb5a076", size = 237561, upload-time = "2026-03-25T20:21:13.098Z" }, + { url = "https://files.pythonhosted.org/packages/48/c1/f41d9cb618acccca7df82aaf682f9b49013c9397212cb9f53219e3abac37/tomli-2.4.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5a881ab208c0baf688221f8cecc5401bd291d67e38a1ac884d6736cbcd8247e9", size = 243824, upload-time = "2026-03-25T20:21:14.569Z" }, + { url = "https://files.pythonhosted.org/packages/22/e4/5a816ecdd1f8ca51fb756ef684b90f2780afc52fc67f987e3c61d800a46d/tomli-2.4.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:47149d5bd38761ac8be13a84864bf0b7b70bc051806bc3669ab1cbc56216b23c", size = 242227, upload-time = "2026-03-25T20:21:15.712Z" }, + { url = "https://files.pythonhosted.org/packages/6b/49/2b2a0ef529aa6eec245d25f0c703e020a73955ad7edf73e7f54ddc608aa5/tomli-2.4.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:ec9bfaf3ad2df51ace80688143a6a4ebc09a248f6ff781a9945e51937008fcbc", size = 247859, upload-time = "2026-03-25T20:21:17.001Z" }, + { url = "https://files.pythonhosted.org/packages/83/bd/6c1a630eaca337e1e78c5903104f831bda934c426f9231429396ce3c3467/tomli-2.4.1-cp311-cp311-win32.whl", hash = "sha256:ff2983983d34813c1aeb0fa89091e76c3a22889ee83ab27c5eeb45100560c049", size = 97204, upload-time = "2026-03-25T20:21:18.079Z" }, + { url = "https://files.pythonhosted.org/packages/42/59/71461df1a885647e10b6bb7802d0b8e66480c61f3f43079e0dcd315b3954/tomli-2.4.1-cp311-cp311-win_amd64.whl", hash = "sha256:5ee18d9ebdb417e384b58fe414e8d6af9f4e7a0ae761519fb50f721de398dd4e", size = 108084, upload-time = "2026-03-25T20:21:18.978Z" }, + { url = "https://files.pythonhosted.org/packages/b8/83/dceca96142499c069475b790e7913b1044c1a4337e700751f48ed723f883/tomli-2.4.1-cp311-cp311-win_arm64.whl", hash = "sha256:c2541745709bad0264b7d4705ad453b76ccd191e64aa6f0fc66b69a293a45ece", size = 95285, upload-time = "2026-03-25T20:21:20.309Z" }, + { url = "https://files.pythonhosted.org/packages/c1/ba/42f134a3fe2b370f555f44b1d72feebb94debcab01676bf918d0cb70e9aa/tomli-2.4.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:c742f741d58a28940ce01d58f0ab2ea3ced8b12402f162f4d534dfe18ba1cd6a", size = 155924, upload-time = "2026-03-25T20:21:21.626Z" }, + { url = "https://files.pythonhosted.org/packages/dc/c7/62d7a17c26487ade21c5422b646110f2162f1fcc95980ef7f63e73c68f14/tomli-2.4.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:7f86fd587c4ed9dd76f318225e7d9b29cfc5a9d43de44e5754db8d1128487085", size = 150018, upload-time = "2026-03-25T20:21:23.002Z" }, + { url = "https://files.pythonhosted.org/packages/5c/05/79d13d7c15f13bdef410bdd49a6485b1c37d28968314eabee452c22a7fda/tomli-2.4.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ff18e6a727ee0ab0388507b89d1bc6a22b138d1e2fa56d1ad494586d61d2eae9", size = 244948, upload-time = "2026-03-25T20:21:24.04Z" }, + { url = "https://files.pythonhosted.org/packages/10/90/d62ce007a1c80d0b2c93e02cab211224756240884751b94ca72df8a875ca/tomli-2.4.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:136443dbd7e1dee43c68ac2694fde36b2849865fa258d39bf822c10e8068eac5", size = 253341, upload-time = "2026-03-25T20:21:25.177Z" }, + { url = "https://files.pythonhosted.org/packages/1a/7e/caf6496d60152ad4ed09282c1885cca4eea150bfd007da84aea07bcc0a3e/tomli-2.4.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:5e262d41726bc187e69af7825504c933b6794dc3fbd5945e41a79bb14c31f585", size = 248159, upload-time = "2026-03-25T20:21:26.364Z" }, + { url = "https://files.pythonhosted.org/packages/99/e7/c6f69c3120de34bbd882c6fba7975f3d7a746e9218e56ab46a1bc4b42552/tomli-2.4.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:5cb41aa38891e073ee49d55fbc7839cfdb2bc0e600add13874d048c94aadddd1", size = 253290, upload-time = "2026-03-25T20:21:27.46Z" }, + { url = "https://files.pythonhosted.org/packages/d6/2f/4a3c322f22c5c66c4b836ec58211641a4067364f5dcdd7b974b4c5da300c/tomli-2.4.1-cp312-cp312-win32.whl", hash = "sha256:da25dc3563bff5965356133435b757a795a17b17d01dbc0f42fb32447ddfd917", size = 98141, upload-time = "2026-03-25T20:21:28.492Z" }, + { url = "https://files.pythonhosted.org/packages/24/22/4daacd05391b92c55759d55eaee21e1dfaea86ce5c571f10083360adf534/tomli-2.4.1-cp312-cp312-win_amd64.whl", hash = "sha256:52c8ef851d9a240f11a88c003eacb03c31fc1c9c4ec64a99a0f922b93874fda9", size = 108847, upload-time = "2026-03-25T20:21:29.386Z" }, + { url = "https://files.pythonhosted.org/packages/68/fd/70e768887666ddd9e9f5d85129e84910f2db2796f9096aa02b721a53098d/tomli-2.4.1-cp312-cp312-win_arm64.whl", hash = "sha256:f758f1b9299d059cc3f6546ae2af89670cb1c4d48ea29c3cacc4fe7de3058257", size = 95088, upload-time = "2026-03-25T20:21:30.677Z" }, + { url = "https://files.pythonhosted.org/packages/07/06/b823a7e818c756d9a7123ba2cda7d07bc2dd32835648d1a7b7b7a05d848d/tomli-2.4.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:36d2bd2ad5fb9eaddba5226aa02c8ec3fa4f192631e347b3ed28186d43be6b54", size = 155866, upload-time = "2026-03-25T20:21:31.65Z" }, + { url = "https://files.pythonhosted.org/packages/14/6f/12645cf7f08e1a20c7eb8c297c6f11d31c1b50f316a7e7e1e1de6e2e7b7e/tomli-2.4.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:eb0dc4e38e6a1fd579e5d50369aa2e10acfc9cace504579b2faabb478e76941a", size = 149887, upload-time = "2026-03-25T20:21:33.028Z" }, + { url = "https://files.pythonhosted.org/packages/5c/e0/90637574e5e7212c09099c67ad349b04ec4d6020324539297b634a0192b0/tomli-2.4.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c7f2c7f2b9ca6bdeef8f0fa897f8e05085923eb091721675170254cbc5b02897", size = 243704, upload-time = "2026-03-25T20:21:34.51Z" }, + { url = "https://files.pythonhosted.org/packages/10/8f/d3ddb16c5a4befdf31a23307f72828686ab2096f068eaf56631e136c1fdd/tomli-2.4.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f3c6818a1a86dd6dca7ddcaaf76947d5ba31aecc28cb1b67009a5877c9a64f3f", size = 251628, upload-time = "2026-03-25T20:21:36.012Z" }, + { url = "https://files.pythonhosted.org/packages/e3/f1/dbeeb9116715abee2485bf0a12d07a8f31af94d71608c171c45f64c0469d/tomli-2.4.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:d312ef37c91508b0ab2cee7da26ec0b3ed2f03ce12bd87a588d771ae15dcf82d", size = 247180, upload-time = "2026-03-25T20:21:37.136Z" }, + { url = "https://files.pythonhosted.org/packages/d3/74/16336ffd19ed4da28a70959f92f506233bd7cfc2332b20bdb01591e8b1d1/tomli-2.4.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:51529d40e3ca50046d7606fa99ce3956a617f9b36380da3b7f0dd3dd28e68cb5", size = 251674, upload-time = "2026-03-25T20:21:38.298Z" }, + { url = "https://files.pythonhosted.org/packages/16/f9/229fa3434c590ddf6c0aa9af64d3af4b752540686cace29e6281e3458469/tomli-2.4.1-cp313-cp313-win32.whl", hash = "sha256:2190f2e9dd7508d2a90ded5ed369255980a1bcdd58e52f7fe24b8162bf9fedbd", size = 97976, upload-time = "2026-03-25T20:21:39.316Z" }, + { url = "https://files.pythonhosted.org/packages/6a/1e/71dfd96bcc1c775420cb8befe7a9d35f2e5b1309798f009dca17b7708c1e/tomli-2.4.1-cp313-cp313-win_amd64.whl", hash = "sha256:8d65a2fbf9d2f8352685bc1364177ee3923d6baf5e7f43ea4959d7d8bc326a36", size = 108755, upload-time = "2026-03-25T20:21:40.248Z" }, + { url = "https://files.pythonhosted.org/packages/83/7a/d34f422a021d62420b78f5c538e5b102f62bea616d1d75a13f0a88acb04a/tomli-2.4.1-cp313-cp313-win_arm64.whl", hash = "sha256:4b605484e43cdc43f0954ddae319fb75f04cc10dd80d830540060ee7cd0243cd", size = 95265, upload-time = "2026-03-25T20:21:41.219Z" }, + { url = "https://files.pythonhosted.org/packages/3c/fb/9a5c8d27dbab540869f7c1f8eb0abb3244189ce780ba9cd73f3770662072/tomli-2.4.1-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:fd0409a3653af6c147209d267a0e4243f0ae46b011aa978b1080359fddc9b6cf", size = 155726, upload-time = "2026-03-25T20:21:42.23Z" }, + { url = "https://files.pythonhosted.org/packages/62/05/d2f816630cc771ad836af54f5001f47a6f611d2d39535364f148b6a92d6b/tomli-2.4.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:a120733b01c45e9a0c34aeef92bf0cf1d56cfe81ed9d47d562f9ed591a9828ac", size = 149859, upload-time = "2026-03-25T20:21:43.386Z" }, + { url = "https://files.pythonhosted.org/packages/ce/48/66341bdb858ad9bd0ceab5a86f90eddab127cf8b046418009f2125630ecb/tomli-2.4.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:559db847dc486944896521f68d8190be1c9e719fced785720d2216fe7022b662", size = 244713, upload-time = "2026-03-25T20:21:44.474Z" }, + { url = "https://files.pythonhosted.org/packages/df/6d/c5fad00d82b3c7a3ab6189bd4b10e60466f22cfe8a08a9394185c8a8111c/tomli-2.4.1-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:01f520d4f53ef97964a240a035ec2a869fe1a37dde002b57ebc4417a27ccd853", size = 252084, upload-time = "2026-03-25T20:21:45.62Z" }, + { url = "https://files.pythonhosted.org/packages/00/71/3a69e86f3eafe8c7a59d008d245888051005bd657760e96d5fbfb0b740c2/tomli-2.4.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:7f94b27a62cfad8496c8d2513e1a222dd446f095fca8987fceef261225538a15", size = 247973, upload-time = "2026-03-25T20:21:46.937Z" }, + { url = "https://files.pythonhosted.org/packages/67/50/361e986652847fec4bd5e4a0208752fbe64689c603c7ae5ea7cb16b1c0ca/tomli-2.4.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:ede3e6487c5ef5d28634ba3f31f989030ad6af71edfb0055cbbd14189ff240ba", size = 256223, upload-time = "2026-03-25T20:21:48.467Z" }, + { url = "https://files.pythonhosted.org/packages/8c/9a/b4173689a9203472e5467217e0154b00e260621caa227b6fa01feab16998/tomli-2.4.1-cp314-cp314-win32.whl", hash = "sha256:3d48a93ee1c9b79c04bb38772ee1b64dcf18ff43085896ea460ca8dec96f35f6", size = 98973, upload-time = "2026-03-25T20:21:49.526Z" }, + { url = "https://files.pythonhosted.org/packages/14/58/640ac93bf230cd27d002462c9af0d837779f8773bc03dee06b5835208214/tomli-2.4.1-cp314-cp314-win_amd64.whl", hash = "sha256:88dceee75c2c63af144e456745e10101eb67361050196b0b6af5d717254dddf7", size = 109082, upload-time = "2026-03-25T20:21:50.506Z" }, + { url = "https://files.pythonhosted.org/packages/d5/2f/702d5e05b227401c1068f0d386d79a589bb12bf64c3d2c72ce0631e3bc49/tomli-2.4.1-cp314-cp314-win_arm64.whl", hash = "sha256:b8c198f8c1805dc42708689ed6864951fd2494f924149d3e4bce7710f8eb5232", size = 96490, upload-time = "2026-03-25T20:21:51.474Z" }, + { url = "https://files.pythonhosted.org/packages/45/4b/b877b05c8ba62927d9865dd980e34a755de541eb65fffba52b4cc495d4d2/tomli-2.4.1-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:d4d8fe59808a54658fcc0160ecfb1b30f9089906c50b23bcb4c69eddc19ec2b4", size = 164263, upload-time = "2026-03-25T20:21:52.543Z" }, + { url = "https://files.pythonhosted.org/packages/24/79/6ab420d37a270b89f7195dec5448f79400d9e9c1826df982f3f8e97b24fd/tomli-2.4.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:7008df2e7655c495dd12d2a4ad038ff878d4ca4b81fccaf82b714e07eae4402c", size = 160736, upload-time = "2026-03-25T20:21:53.674Z" }, + { url = "https://files.pythonhosted.org/packages/02/e0/3630057d8eb170310785723ed5adcdfb7d50cb7e6455f85ba8a3deed642b/tomli-2.4.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1d8591993e228b0c930c4bb0db464bdad97b3289fb981255d6c9a41aedc84b2d", size = 270717, upload-time = "2026-03-25T20:21:55.129Z" }, + { url = "https://files.pythonhosted.org/packages/7a/b4/1613716072e544d1a7891f548d8f9ec6ce2faf42ca65acae01d76ea06bb0/tomli-2.4.1-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:734e20b57ba95624ecf1841e72b53f6e186355e216e5412de414e3c51e5e3c41", size = 278461, upload-time = "2026-03-25T20:21:56.228Z" }, + { url = "https://files.pythonhosted.org/packages/05/38/30f541baf6a3f6df77b3df16b01ba319221389e2da59427e221ef417ac0c/tomli-2.4.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:8a650c2dbafa08d42e51ba0b62740dae4ecb9338eefa093aa5c78ceb546fcd5c", size = 274855, upload-time = "2026-03-25T20:21:57.653Z" }, + { url = "https://files.pythonhosted.org/packages/77/a3/ec9dd4fd2c38e98de34223b995a3b34813e6bdadf86c75314c928350ed14/tomli-2.4.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:504aa796fe0569bb43171066009ead363de03675276d2d121ac1a4572397870f", size = 283144, upload-time = "2026-03-25T20:21:59.089Z" }, + { url = "https://files.pythonhosted.org/packages/ef/be/605a6261cac79fba2ec0c9827e986e00323a1945700969b8ee0b30d85453/tomli-2.4.1-cp314-cp314t-win32.whl", hash = "sha256:b1d22e6e9387bf4739fbe23bfa80e93f6b0373a7f1b96c6227c32bef95a4d7a8", size = 108683, upload-time = "2026-03-25T20:22:00.214Z" }, + { url = "https://files.pythonhosted.org/packages/12/64/da524626d3b9cc40c168a13da8335fe1c51be12c0a63685cc6db7308daae/tomli-2.4.1-cp314-cp314t-win_amd64.whl", hash = "sha256:2c1c351919aca02858f740c6d33adea0c5deea37f9ecca1cc1ef9e884a619d26", size = 121196, upload-time = "2026-03-25T20:22:01.169Z" }, + { url = "https://files.pythonhosted.org/packages/5a/cd/e80b62269fc78fc36c9af5a6b89c835baa8af28ff5ad28c7028d60860320/tomli-2.4.1-cp314-cp314t-win_arm64.whl", hash = "sha256:eab21f45c7f66c13f2a9e0e1535309cee140182a9cdae1e041d02e47291e8396", size = 100393, upload-time = "2026-03-25T20:22:02.137Z" }, + { url = "https://files.pythonhosted.org/packages/7b/61/cceae43728b7de99d9b847560c262873a1f6c98202171fd5ed62640b494b/tomli-2.4.1-py3-none-any.whl", hash = "sha256:0d85819802132122da43cb86656f8d1f8c6587d54ae7dcaf30e90533028b49fe", size = 14583, upload-time = "2026-03-25T20:22:03.012Z" }, +] + +[[package]] +name = "typing-extensions" +version = "4.15.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/72/94/1a15dd82efb362ac84269196e94cf00f187f7ed21c242792a923cdb1c61f/typing_extensions-4.15.0.tar.gz", hash = "sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466", size = 109391, upload-time = "2025-08-25T13:49:26.313Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614, upload-time = "2025-08-25T13:49:24.86Z" }, +] + +[[package]] +name = "virtualenv" +version = "21.4.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "distlib" }, + { name = "filelock" }, + { name = "platformdirs" }, + { name = "python-discovery" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/e1/0d/4e93c8e6d1001a75763f87d8f5ecda8ebc7f4aa2153dddfaf4ae8892821a/virtualenv-21.4.2.tar.gz", hash = "sha256:38e6ee0a555615c0ea9da2ac7e9998fe8dc3b911dd33ad8eaad2020957653b0c", size = 7613326, upload-time = "2026-05-31T17:01:22.827Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/bf/c4/557dc082be035381b85fdb2b74e21d3d21b57750b74f2b47a32f3a639ff9/virtualenv-21.4.2-py3-none-any.whl", hash = "sha256:854210ca524a1a4d0d744734f4acbc721c3ffe163b85bbf5d56d14d5ae2f0fae", size = 7594079, upload-time = "2026-05-31T17:01:20.735Z" }, +] diff --git a/scripts/check-wardline-version-bounds.py b/scripts/check-wardline-version-bounds.py index 9d9294d3..387bd78e 100755 --- a/scripts/check-wardline-version-bounds.py +++ b/scripts/check-wardline-version-bounds.py @@ -1,8 +1,9 @@ #!/usr/bin/env python3 """Validate the Wardline integration version bounds in the Python plugin manifest. -The Python plugin declares the Wardline version range it integrates against in -``plugins/python/plugin.toml`` under ``[integrations.wardline]``: +When the Python plugin advertises Wardline semantic extraction, it declares the +Wardline version range it integrates against in ``plugins/python/plugin.toml`` +under ``[integrations.wardline]``: * ``min_version`` — inclusive lower bound (the oldest Wardline the plugin's ``wardline.core.registry`` import surface is verified against). @@ -10,8 +11,11 @@ so a future major release triggers an explicit re-pin rather than silent drift (see the comment in plugin.toml and loom.md §5 asterisk 2). -This guard enforces the *local* half of the contract: both bounds are present, -parse as semver, and form a sane half-open range ``[min, max)``. The +This guard enforces the *local* half of the contract. If +``capabilities.runtime.wardline_aware`` is ``true``, both bounds must be +present, parse as semver, and form a sane half-open range ``[min, max)``. If the +capability is ``false``, the bounds block must be absent so a dormant +package/version probe cannot look like usable semantic integration. The *server-side* cross-check (confirming the resolved Wardline actually advertises a version inside the range at integration time) is future work — see ``server_side_cross_check_hook`` for the documented seam. @@ -52,13 +56,51 @@ def parse_semver(label: str, value: object) -> tuple[int, int, int]: return (int(match["major"]), int(match["minor"]), int(match["patch"])) -def wardline_bounds(manifest_path: Path) -> tuple[str, str]: - """Return the raw (min_version, max_version) strings from the manifest.""" - manifest = tomllib.loads(manifest_path.read_text(encoding="utf-8")) +def load_manifest(manifest_path: Path) -> dict[str, object]: + """Load the TOML manifest.""" + return tomllib.loads(manifest_path.read_text(encoding="utf-8")) + + +def wardline_aware(manifest_path: Path, manifest: dict[str, object]) -> bool: + """Return the explicit Wardline capability flag.""" try: - section = manifest["integrations"]["wardline"] + value = manifest["capabilities"]["runtime"]["wardline_aware"] # type: ignore[index] + except (KeyError, TypeError) as exc: + raise CheckError( + f"{manifest_path} is missing capabilities.runtime.wardline_aware" + ) from exc + if not isinstance(value, bool): + raise CheckError( + f"{manifest_path} capabilities.runtime.wardline_aware must be boolean" + ) + return value + + +def wardline_bounds(manifest_path: Path) -> tuple[str, str] | None: + """Return raw (min_version, max_version), or None when capability is off.""" + manifest = load_manifest(manifest_path) + enabled = wardline_aware(manifest_path, manifest) + integrations = manifest.get("integrations") + section = None + if isinstance(integrations, dict): + section = integrations.get("wardline") + + if not enabled: + if section is not None: + raise CheckError( + f"{manifest_path} has [integrations.wardline] while " + "capabilities.runtime.wardline_aware is false" + ) + return None + + try: + if not isinstance(section, dict): + raise KeyError except KeyError as exc: - raise CheckError(f"{manifest_path} is missing [integrations.wardline]") from exc + raise CheckError( + f"{manifest_path} advertises Wardline awareness but is missing " + "[integrations.wardline]" + ) from exc missing = [key for key in ("min_version", "max_version") if key not in section] if missing: raise CheckError( @@ -67,9 +109,12 @@ def wardline_bounds(manifest_path: Path) -> tuple[str, str]: return str(section["min_version"]), str(section["max_version"]) -def check(manifest_path: Path) -> tuple[str, str]: - """Return (min, max) if the bounds are valid, else raise CheckError.""" - raw_min, raw_max = wardline_bounds(manifest_path) +def check(manifest_path: Path) -> tuple[str, str] | None: + """Return (min, max) if enabled, None if disabled, else raise CheckError.""" + bounds = wardline_bounds(manifest_path) + if bounds is None: + return None + raw_min, raw_max = bounds min_core = parse_semver("[integrations.wardline].min_version", raw_min) max_core = parse_semver("[integrations.wardline].max_version", raw_max) if min_core >= max_core: @@ -88,7 +133,10 @@ def server_side_cross_check_hook(resolved_version: str, manifest_path: Path) -> guard only enforces the locally-checkable invariants and this hook is not wired into ``main``. """ - raw_min, raw_max = check(manifest_path) + bounds = check(manifest_path) + if bounds is None: + return False + raw_min, raw_max = bounds resolved = parse_semver("resolved Wardline version", resolved_version) return parse_semver("min", raw_min) <= resolved < parse_semver("max", raw_max) @@ -98,7 +146,15 @@ def write(path: Path, text: str) -> None: def run_self_test() -> None: - aligned = '[integrations.wardline]\nmin_version = "1.0.0"\nmax_version = "2.0.0"\n' + aligned = ( + "[capabilities.runtime]\n" + "wardline_aware = true\n" + "\n" + "[integrations.wardline]\n" + 'min_version = "1.0.0"\n' + 'max_version = "2.0.0"\n' + ) + disabled = "[capabilities.runtime]\nwardline_aware = false\n" with tempfile.TemporaryDirectory() as tmp: manifest = Path(tmp) / "plugin.toml" @@ -106,9 +162,22 @@ def run_self_test() -> None: write(manifest, aligned) assert check(manifest) == ("1.0.0", "2.0.0") + write(manifest, disabled) + assert check(manifest) is None + + write( + manifest, + disabled + + "\n[integrations.wardline]\n" + + 'min_version = "1.0.0"\n' + + 'max_version = "2.0.0"\n', + ) + _expect(manifest, "wardline_aware is false") + # Inverted bounds must fail. write( manifest, + "[capabilities.runtime]\nwardline_aware = true\n" '[integrations.wardline]\nmin_version = "2.0.0"\nmax_version = "1.0.0"\n', ) _expect(manifest, "half-open range") @@ -116,6 +185,7 @@ def run_self_test() -> None: # Equal bounds (empty range) must fail. write( manifest, + "[capabilities.runtime]\nwardline_aware = true\n" '[integrations.wardline]\nmin_version = "1.0.0"\nmax_version = "1.0.0"\n', ) _expect(manifest, "half-open range") @@ -123,14 +193,19 @@ def run_self_test() -> None: # Non-semver bound must fail. write( manifest, + "[capabilities.runtime]\nwardline_aware = true\n" '[integrations.wardline]\nmin_version = "1.0" \nmax_version = "2.0.0"\n', ) _expect(manifest, "not valid semver") - # A missing section must fail loudly, not pass vacuously. - write(manifest, "[ontology]\nx = 1\n") + # An enabled capability without bounds must fail loudly, not pass vacuously. + write(manifest, "[capabilities.runtime]\nwardline_aware = true\n") _expect(manifest, "missing [integrations.wardline]") + # Missing capability flag is malformed. + write(manifest, "[ontology]\nx = 1\n") + _expect(manifest, "missing capabilities.runtime.wardline_aware") + # The cross-check hook accepts an in-range version and rejects out-of-range. write(manifest, aligned) assert server_side_cross_check_hook("1.4.2", manifest) is True @@ -164,8 +239,12 @@ def main(argv: list[str]) -> int: if args.self_test: run_self_test() else: - raw_min, raw_max = check(args.manifest) - print(f"Wardline version bounds valid: [{raw_min}, {raw_max})") + bounds = check(args.manifest) + if bounds is None: + print("Wardline integration not advertised; no bounds required") + else: + raw_min, raw_max = bounds + print(f"Wardline version bounds valid: [{raw_min}, {raw_max})") except CheckError as exc: print(f"Wardline version-bounds guard failed: {exc}", file=sys.stderr) return 1 diff --git a/tests/e2e/sprint_1_walking_skeleton.sh b/tests/e2e/sprint_1_walking_skeleton.sh index 11a9690a..9c495a62 100755 --- a/tests/e2e/sprint_1_walking_skeleton.sh +++ b/tests/e2e/sprint_1_walking_skeleton.sh @@ -123,20 +123,42 @@ fi # ── 8. Verify source metadata for MCP entity_at/summary cache (B.6a) ───────── log "verifying persisted Python function source metadata ..." SOURCE_METADATA=$(sqlite3 "$DEMO_DIR/.clarion/clarion.db" \ - "select source_file_path, source_line_start, source_line_end, length(content_hash) from entities where id = 'python:function:demo.hello';") -SOURCE_METADATA_EXPECTED="$DEMO_DIR/demo.py|10|11|64" + "select source_file_id, source_file_path, source_line_start, source_line_end, length(content_hash) from entities where id = 'python:function:demo.hello';") +SOURCE_METADATA_EXPECTED="core:file:demo.py|$DEMO_DIR/demo.py|10|11|64" if [ "$SOURCE_METADATA" != "$SOURCE_METADATA_EXPECTED" ]; then log "DB entity source metadata:" sqlite3 "$DEMO_DIR/.clarion/clarion.db" \ - "select id, source_file_path, source_line_start, source_line_end, content_hash from entities order by id;" >&2 || true + "select id, source_file_id, source_file_path, source_line_start, source_line_end, content_hash from entities order by id;" >&2 || true fail "expected Python function source metadata:\n$SOURCE_METADATA_EXPECTED\ngot:\n$SOURCE_METADATA" fi +log "verifying core file anchor metadata and module parent chain ..." +FILE_ANCHOR=$(sqlite3 "$DEMO_DIR/.clarion/clarion.db" \ + "select id, plugin_id, kind, name, source_file_id, source_file_path, json_extract(properties, '\$.language'), length(content_hash) from entities where id = 'core:file:demo.py';") +FILE_ANCHOR_EXPECTED="core:file:demo.py|core|file|demo.py||$DEMO_DIR/demo.py|python|64" +if [ "$FILE_ANCHOR" != "$FILE_ANCHOR_EXPECTED" ]; then + log "DB file anchor metadata:" + sqlite3 "$DEMO_DIR/.clarion/clarion.db" \ + "select id, plugin_id, kind, name, parent_id, source_file_id, source_file_path, properties, content_hash from entities order by id;" >&2 || true + fail "expected core file anchor metadata:\n$FILE_ANCHOR_EXPECTED\ngot:\n$FILE_ANCHOR" +fi + +MODULE_PARENT=$(sqlite3 "$DEMO_DIR/.clarion/clarion.db" \ + "select parent_id, source_file_id from entities where id = 'python:module:demo';") +MODULE_PARENT_EXPECTED="core:file:demo.py|core:file:demo.py" +if [ "$MODULE_PARENT" != "$MODULE_PARENT_EXPECTED" ]; then + log "DB module parent/source metadata:" + sqlite3 "$DEMO_DIR/.clarion/clarion.db" \ + "select id, parent_id, source_file_id from entities order by id;" >&2 || true + fail "expected module parent/source metadata:\n$MODULE_PARENT_EXPECTED\ngot:\n$MODULE_PARENT" +fi + # ── 9. Verify contains edge via sqlite3 (B.3) ──────────────────────────────── log "verifying persisted contains edge via sqlite3 ..." EDGE_RESULT=$(sqlite3 "$DEMO_DIR/.clarion/clarion.db" \ "select kind, from_id, to_id from edges where kind = 'contains' order by from_id, to_id;") -EDGE_EXPECTED="contains|python:module:demo|python:class:demo.Marker +EDGE_EXPECTED="contains|core:file:demo.py|python:module:demo +contains|python:module:demo|python:class:demo.Marker contains|python:module:demo|python:function:demo.annotated contains|python:module:demo|python:function:demo.hello contains|python:module:demo|python:function:demo.via_dispatch diff --git a/tests/e2e/sprint_2_mcp_surface.sh b/tests/e2e/sprint_2_mcp_surface.sh index dc69ed91..6e1451a5 100755 --- a/tests/e2e/sprint_2_mcp_surface.sh +++ b/tests/e2e/sprint_2_mcp_surface.sh @@ -141,6 +141,7 @@ world_prompt = ( f"Entity id: {world_entity[0]}\n" f"Kind: {world_entity[1]}\n" f"Name: {world_entity[2]}\n" + "Matching guidance:\nNo matching guidance.\n" f"Source excerpt:\n{world_excerpt}\n" "Return JSON with purpose, behavior, relationships, and risks fields." ) @@ -418,23 +419,44 @@ tools = responses["tools"]["result"]["tools"] tool_names = [tool["name"] for tool in tools] assert tool_names == [ "entity_at", - "find_entity", - "callers_of", - "execution_paths_from", - "summary", - "issues_for", - "neighborhood", - "subsystem_members", - "subsystem_of", - "project_status", - "summary_preview_cost", - "source_for_entity", - "call_sites", - "orientation_pack", + "entity_find", + "entity_callers_list", + "entity_execution_path_list", + "entity_summary_get", + "entity_issue_list", + "entity_neighborhood_get", + "subsystem_member_list", + "entity_subsystem_get", + "project_status_get", + "entity_summary_preview_cost_get", + "entity_source_get", + "entity_call_site_list", + "entity_orientation_pack_get", "analyze_start", - "analyze_status", + "analyze_status_get", "analyze_cancel", - "index_diff", + "index_diff_get", + "entity_guidance_list", + "propose_guidance", + "promote_guidance", + "entity_finding_list", + "entity_wardline_get", + "entity_tag_list", + "entity_kind_list", + "entity_wardline_list", + "module_circular_import_list", + "entity_coupling_hotspot_list", + "entity_entry_point_list", + "entity_http_route_list", + "entity_data_model_list", + "entity_test_list", + "entity_deprecation_list", + "entity_todo_list", + "entity_test_caller_list", + "entity_high_churn_list", + "entity_recent_change_list", + "entity_dead_list", + "entity_semantic_search_list", ], tool_names # Single-source check (clarion-71f0d6c3dd): the initialize `instructions` tool # enumeration is derived from list_tools(), so every advertised tool must appear @@ -535,4 +557,4 @@ assert "staleness" in ctx, ctx assert ctx["degraded"] is False, ctx PY -log "PASS: MCP stdio surface returned eighteen tool definitions and nine tool responses" +log "PASS: MCP stdio surface returned the full tool catalogue and all expected tool responses" diff --git a/web/docs/concepts/mcp-tools.md b/web/docs/concepts/mcp-tools.md index c4cc110a..795328c3 100644 --- a/web/docs/concepts/mcp-tools.md +++ b/web/docs/concepts/mcp-tools.md @@ -23,10 +23,11 @@ Each answer is structured, paginated where needed, and carries enough metadata (confidence, `scope_excludes`, freshness) for the agent to know how much to trust it. -## The eight core tools +## Core tool families -These eight consult tools are the stable heart of the surface — the ones the -v1.0 README commits to and the place to start: +Clarion exposes a 39-tool MCP surface. Start with the navigation and briefing +tools, then reach for catalogue shortcuts when you need a targeted structural +query: | Tool | What it answers | | --- | --- | @@ -38,18 +39,20 @@ v1.0 README commits to and the place to start: | `issues_for(id)` | "What Filigree issues are attached to this entity?" | | `neighborhood(id)` | "Show callers, callees, container, contained, references, imports in one hop." | | `subsystem_members(id)` | "Which entities belong to this subsystem?" | +| `source_for_entity(id)` | "Show the indexed source span and context." | +| `orientation_pack(id or file/line)` | "Give me the entity, context, neighbors, paths, issues, and freshness in one packet." | +| `guidance_for(id)` | "Which guidance sheets apply to this entity?" | +| `find_dead_code(scope?)` / `search_semantic(query)` | "Run advanced reachability or semantic-search queries when their inputs are available." | See the [MCP tool reference](../reference/mcp-tools.md) for parameters and the shape of each response. ## A broader, growing catalogue -The eight above are the foundation, but they aren't the whole surface. The -running server also exposes navigation and structural-search tools — -`subsystem_of`, `neighborhood` roll-ups at module altitude, `find_by_kind`, -`source_for_entity`, an `orientation_pack` for cold-start onboarding, and more — -and the catalogue keeps growing as new query shapes prove useful. Connect an MCP -client to a live `clarion serve` to see the full, current `tools/list`. +The running server also exposes analyze lifecycle tools, freshness checks, +faceted search, guidance/finding inspection, source/call-site evidence, and +exploration-elimination shortcuts. Connect an MCP client to a live +`clarion serve` to see the full, current `tools/list`. ## Enrich-only by design diff --git a/web/docs/getting-started.md b/web/docs/getting-started.md index a6313fe4..da40b6a3 100644 --- a/web/docs/getting-started.md +++ b/web/docs/getting-started.md @@ -10,14 +10,14 @@ Clarion is a single Rust binary; Python support ships as a separate language plugin. Pull both from the latest GitHub Release: ```bash -TAG=v1.0.0 +TAG=v1.2.0 curl -L -o clarion.tar.gz \ "https://github.com/tachyon-beep/clarion/releases/download/${TAG}/clarion-x86_64-unknown-linux-gnu.tar.gz" tar xzf clarion.tar.gz install clarion-x86_64-unknown-linux-gnu/clarion ~/.local/bin/ pipx install \ - "https://github.com/tachyon-beep/clarion/releases/download/${TAG}/clarion-plugin-python-1.0.0.tar.gz" + "https://github.com/tachyon-beep/clarion/releases/download/${TAG}/clarion-plugin-python-1.2.0.tar.gz" ``` Confirm the binary is on your `PATH`: diff --git a/web/docs/index.md b/web/docs/index.md index e5a1dcfe..1d71604a 100644 --- a/web/docs/index.md +++ b/web/docs/index.md @@ -11,13 +11,13 @@ Clarion is a single Rust binary plus a Python language plugin. Grab both from the latest GitHub Release: ```bash -TAG=v1.0.0 +TAG=v1.2.0 curl -L -o clarion.tar.gz \ "https://github.com/tachyon-beep/clarion/releases/download/${TAG}/clarion-x86_64-unknown-linux-gnu.tar.gz" tar xzf clarion.tar.gz install clarion-x86_64-unknown-linux-gnu/clarion ~/.local/bin/ pipx install \ - "https://github.com/tachyon-beep/clarion/releases/download/${TAG}/clarion-plugin-python-1.0.0.tar.gz" + "https://github.com/tachyon-beep/clarion/releases/download/${TAG}/clarion-plugin-python-1.2.0.tar.gz" ``` The [Getting Started](getting-started.md) guide covers a fresh-machine install, diff --git a/web/docs/reference/mcp-tools.md b/web/docs/reference/mcp-tools.md index c638f087..8c9b0179 100644 --- a/web/docs/reference/mcp-tools.md +++ b/web/docs/reference/mcp-tools.md @@ -1,9 +1,10 @@ # MCP tool reference -The tools below are the eight core consult tools, served by `clarion serve` over -the MCP stdio transport. Descriptions match the tool docstrings shipped in the -binary. A live server exposes additional navigation and structural-search tools; -connect an MCP client and read `tools/list` for the complete, current catalogue. +The tools below are the core consult tools served by `clarion serve` over the +MCP stdio transport. The live 1.2.x surface exposes 39 tools, including +navigation, briefing, source inspection, guidance/finding enrichment, analyze +lifecycle, freshness, faceted search, and structural shortcuts. Connect an MCP +client and read `tools/list` for the complete, current catalogue. !!! note "Default confidence is `resolved`" Graph-traversal tools (`callers_of`, `neighborhood`, `execution_paths_from`) @@ -51,9 +52,9 @@ when an edge-cap or path-cap trims the result. ## `summary(id)` Returns an on-demand, cached one-paragraph summary for one entity, dispatching -the LLM lazily. In v1.0 this is **leaf scope** — a module summary describes the -module docstring and top-level members, not an aggregation of contained -summaries. If the model returns non-JSON, the response degrades to a +the LLM lazily. A module summary describes the module docstring and top-level +members, not an aggregation of contained summaries. If the model returns +non-JSON, the response degrades to a deterministic `structural-fallback` summary built from the source, and that fallback is cached so a retry is a free cache hit rather than a re-billed failure. @@ -85,3 +86,19 @@ Lists the module entities assigned to a subsystem entity. The reverse lookup — "which subsystem does this entity belong to?" — is `subsystem_of(id)`, which accepts any entity id and resolves a function or class through its nearest containing module. + +## Additional catalogue + +Use `tools/list` for exact schemas. The remaining tool families include: + +- Source and orientation: `source_for_entity`, `call_sites`, + `orientation_pack`, `project_status`, `summary_preview_cost`. +- Guidance and findings: `guidance_for`, `propose_guidance`, + `promote_guidance`, `findings_for`, `wardline_for`. +- Analyze and freshness: `analyze_start`, `analyze_status`, + `analyze_cancel`, `index_diff`. +- Facets and shortcuts: `find_by_tag`, `find_by_kind`, `find_by_wardline`, + `find_circular_imports`, `find_coupling_hotspots`, `find_entry_points`, + `find_http_routes`, `find_data_models`, `find_tests`, `find_deprecations`, + `find_todos`, `what_tests_this`, `high_churn`, `recently_changed`, + `find_dead_code`, `search_semantic`.