From 455640daf860c36faf14050c0b763cbc580dba25 Mon Sep 17 00:00:00 2001
From: Vo <tomvothecoder@gmail.com>
Date: Wed, 13 May 2026 14:28:31 -0700
Subject: [PATCH 1/5] Add live plan for zppy links feature

---
 backend/docs/174-zppy-links/plan.md | 126 ++++++++++++++++++++++++++++
 1 file changed, 126 insertions(+)
 create mode 100644 backend/docs/174-zppy-links/plan.md

diff --git a/backend/docs/174-zppy-links/plan.md b/backend/docs/174-zppy-links/plan.md
new file mode 100644
index 00000000..4070884c
--- /dev/null
+++ b/backend/docs/174-zppy-links/plan.md
@@ -0,0 +1,126 @@
+# Plan: Connect zppy Diagnostics to SimBoard Simulations
+
+## Task
+
+Replace manual diagnostic URL-pasting with automated, metadata-based linking of zppy diagnostics outputs to SimBoard simulation records.
+
+## Scope
+
+**In:** matching strategy (`case_name` + `machine` + `hpc_username`), persistence model, discovery mechanism, zppy manifest spec.
+**Out:** frontend UI redesign, PACE changes, existing manual link workflow, Case uniqueness refactor (issue #136), diagnostics data ingestion (Phase 2+).
+
+## Key Decisions
+
+### Do NOT parse public HTML directories
+
+- **Fragility** — HTML layouts vary by web server; breaks on config changes.
+- **Security** — SSRF and content-injection attack surface.
+- **Coupling** — SimBoard depends on external web server availability/structure.
+- **Latency** — Network crawling is slow and unreliable for production.
+
+### zppy writes a manifest file (no API call)
+
+zppy runs as a user-level Python package on HPC machines. SimBoard's API requires `SERVICE_ACCOUNT` or `ADMIN` bearer tokens — it is not feasible for each user to obtain and configure an API token in zppy.
+
+**Instead, zppy writes a small manifest file to a well-known location within its output directory.** This requires zero authentication, zero network access from zppy, and trivial zppy-side changes.
+
+```jsonc
+// <zppy_output_dir>/.simboard-diagnostics.json
+{
+  "case_name": "v3.LR.historical_0201",
+  "machine": "chrysalis",
+  "hpc_username": "user123",
+  "diagnostics": [
+    {
+      "kind": "e3sm_diags",
+      "url": "https://web.lcrc.anl.gov/...",
+      "label": "E3SM Diags",
+    },
+    {
+      "kind": "mpas_analysis",
+      "url": "https://web.lcrc.anl.gov/...",
+      "label": "MPAS-Analysis",
+    },
+  ],
+}
+```
+
+zppy already knows `case_name`, `machine`, and the running user from its cfg. `mache` resolves machine-specific public web URL prefixes to construct the URLs.
+
+### SimBoard discovers manifests via filesystem scanning (not API push)
+
+SimBoard already has a CronJob-based filesystem scanner (`nersc_archive_ingestor.py`) that:
+
+- Walks mounted HPC directories every 15 min
+- Uses state-based incremental dedup
+- Authenticates with a single service account token
+- Calls `POST /ingestions/from-path` internally
+
+The diagnostics linking should follow the same pattern: a SimBoard-side scanner discovers `.simboard-diagnostics.json` manifests, reads them, matches to existing Cases, and creates `ExternalLink` rows. **No per-user tokens needed.**
+
+### Persist links in database (not query-time resolution)
+
+Store `ExternalLink` rows on match. No remote calls during frontend queries. Same model as manually-added links — frontend works with zero changes. `ExternalLink.created_at` provides audit trail.
+
+## Approach
+
+1. **Join key:** `(case_name, machine, hpc_username)` — all three required. `case_name` alone is not globally unique (`Case.name` has a unique index but different users can reuse names). Adding `machine` + `hpc_username` disambiguates. `CASE_HASH` is unreliable across executions (see issue #136). zppy has all three values.
+
+2. **Matching query:** Machine is on `Simulation` not `Case`, so the resolver joins: `Case.name == X AND Simulation.machine_id == Y AND Simulation.hpc_username == Z`. All simulations in a case share the same machine in practice.
+
+3. **zppy-side (minimal change):** After diagnostics complete, zppy writes `.simboard-diagnostics.json` to its output directory. `mache` resolves public URL prefixes. No API call, no token.
+
+4. **SimBoard diagnostics scanner** — two options:
+
+   **Option A — Extend NERSC archive ingestor (recommended for MVP):** Add a post-scan phase to the existing `nersc_archive_ingestor.py` that also walks known diagnostics output directories (or the same archive tree) looking for `.simboard-diagnostics.json` files. On discovery, it calls a new internal endpoint or directly creates `ExternalLink` rows via the existing service account token.
+
+   **Option B — Separate diagnostics scanner script:** A new lightweight CronJob script (`diagnostics_link_scanner.py`) that walks diagnostics output directories. Same pattern as `nersc_archive_ingestor.py` — env-configured, state-file dedup, service account auth. Better separation of concerns, but more operational overhead.
+
+5. **API endpoint** (extend `backend/app/features/simulation/api.py`):
+
+   ```
+   POST /api/v1/diagnostics/link
+   Body: { "case_name": "...", "machine": "...", "hpc_username": "...", "diagnostics": [...] }
+   ```
+
+   Restricted to `ADMIN` / `SERVICE_ACCOUNT` roles (same as ingestion). Resolves the triple → `Case` → creates `ExternalLink` rows with `kind = diagnostic`. The scanner calls this endpoint; users don't call it directly.
+
+6. **Schema:** Add `DiagnosticsLinkRequest` to `backend/app/features/simulation/schemas.py`.
+
+7. **Migration:** None if linking to existing FK targets. Required if adding `case_id` FK to `ExternalLink` (see open question #1).
+
+8. **Frontend:** No changes. Existing `grouped_links` rendering picks up new diagnostic links automatically.
+
+### Alternative: Convention-based URL derivation (no zppy changes)
+
+For production runs with enforced path conventions, SimBoard could derive diagnostic URLs from simulation metadata + `mache` without any zppy changes or manifest files. Per issue #174, zppy outputs follow a fixed directory structure and `mache` resolves per-machine URL prefixes.
+
+This works only when path conventions are strict. The manifest approach is more robust for custom user paths. Could combine both: derive URLs for production campaigns, manifest for custom runs.
+
+## Tests
+
+- `backend/tests/features/simulation/test_api.py` — endpoint tests:
+  - Happy path: matching `(case_name, machine, hpc_username)` → links created
+  - Different user, same case_name + machine → no cross-linking (isolation test)
+  - No matching case → 404
+  - Duplicate link idempotency
+  - Invalid payload → 422
+- Scanner tests: manifest discovery, state dedup, malformed manifest handling
+- Run: `make backend-test && make pre-commit-run`
+
+## Risk
+
+**Score: 3 (normal)**
+
+1. **zppy adoption lag** — No data until zppy emits manifests. Mitigate with convention-based derivation for production runs.
+2. **Case name collision** — `Case.name` is unique in DB but not globally meaningful. The `(case_name, machine, hpc_username)` triple mitigates. Broader fix tracked in issue #136.
+3. **Diagnostics output path visibility** — Scanner must have filesystem access to zppy output directories. On NERSC this requires mounting the relevant CFS paths into the SimBoard container (same pattern as performance archive).
+4. **Timing gap** — Scanner-based approach has up to 15-min latency. Acceptable for diagnostics linking.
+
+## Open Questions (ask colleagues)
+
+1. **Case-level vs execution-level linking?** zppy diagnostics run across simulation output in time increments — they're inherently case-scoped, not tied to a specific execution/LID. Current `ExternalLink` only has `simulation_id` FK. Options: (a) add optional `case_id` FK to `ExternalLink`, (b) create a separate `CaseLink` model, (c) link to reference simulation only as a pragmatic shortcut. This is the key schema decision.
+2. **Case uniqueness long-term?** The `(case_name, machine, hpc_username)` triple is a pragmatic join key but `Case.name` as the sole DB unique constraint is fragile. Issue #136 is evaluating `CASE_HASH` but it's unstable across executions. Should Case uniqueness be strengthened in the model itself?
+3. **Diagnostics output directory location?** Where are zppy outputs stored on each machine? Need the path pattern to configure the scanner. Per issue #174, the coupled group stores results on machine web servers — need the exact filesystem mount paths for NERSC (and other machines if applicable).
+4. **Retroactive linking needed?** If yes, plan a one-time bulk-linking script (or convention-based derivation) for existing diagnostics that predate this feature.
+5. **Convention-based derivation viable for MVP?** If zppy output paths are predictable enough from `(case_name, machine, hpc_username)` + `mache`, SimBoard could derive diagnostic URLs without any zppy changes. Worth evaluating as a faster MVP path.

From d498b7e12f70f9819619b8b66f5924a739079352 Mon Sep 17 00:00:00 2001
From: Vo <tomvothecoder@gmail.com>
Date: Thu, 14 May 2026 11:09:05 -0700
Subject: [PATCH 2/5] Update plan.md

---
 backend/docs/174-zppy-links/plan.md | 285 +++++++++++++++++++---------
 1 file changed, 196 insertions(+), 89 deletions(-)

diff --git a/backend/docs/174-zppy-links/plan.md b/backend/docs/174-zppy-links/plan.md
index 4070884c..a9084863 100644
--- a/backend/docs/174-zppy-links/plan.md
+++ b/backend/docs/174-zppy-links/plan.md
@@ -1,126 +1,233 @@
 # Plan: Connect zppy Diagnostics to SimBoard Simulations
 
-## Task
+## Goal
 
-Replace manual diagnostic URL-pasting with automated, metadata-based linking of zppy diagnostics outputs to SimBoard simulation records.
+Replace manual diagnostics URL entry with automated linking from zppy diagnostics outputs to existing SimBoard simulation records.
 
 ## Scope
 
-**In:** matching strategy (`case_name` + `machine` + `hpc_username`), persistence model, discovery mechanism, zppy manifest spec.
-**Out:** frontend UI redesign, PACE changes, existing manual link workflow, Case uniqueness refactor (issue #136), diagnostics data ingestion (Phase 2+).
-
-## Key Decisions
-
-### Do NOT parse public HTML directories
-
-- **Fragility** — HTML layouts vary by web server; breaks on config changes.
-- **Security** — SSRF and content-injection attack surface.
-- **Coupling** — SimBoard depends on external web server availability/structure.
-- **Latency** — Network crawling is slow and unreliable for production.
-
-### zppy writes a manifest file (no API call)
-
-zppy runs as a user-level Python package on HPC machines. SimBoard's API requires `SERVICE_ACCOUNT` or `ADMIN` bearer tokens — it is not feasible for each user to obtain and configure an API token in zppy.
-
-**Instead, zppy writes a small manifest file to a well-known location within its output directory.** This requires zero authentication, zero network access from zppy, and trivial zppy-side changes.
-
-```jsonc
-// <zppy_output_dir>/.simboard-diagnostics.json
-{
-  "case_name": "v3.LR.historical_0201",
-  "machine": "chrysalis",
-  "hpc_username": "user123",
-  "diagnostics": [
-    {
-      "kind": "e3sm_diags",
-      "url": "https://web.lcrc.anl.gov/...",
-      "label": "E3SM Diags",
-    },
-    {
-      "kind": "mpas_analysis",
-      "url": "https://web.lcrc.anl.gov/...",
-      "label": "MPAS-Analysis",
-    },
-  ],
-}
+### In
+
+- Add required zppy provenance fields: `case_name`, `machine`, `hpc_username`
+- Discover zppy diagnostics provenance files from configured filesystem roots
+- Confirm diagnostics completion before linking
+- Match diagnostics to SimBoard records using `(case_name, machine, hpc_username)`
+- Create idempotent diagnostic `ExternalLink` rows
+- Maintain scanner state to avoid repeated processing
+
+### Out
+
+- Frontend redesign
+- Changes to manual external-link workflows
+- PACE integration changes
+- Case identity or uniqueness refactor
+- Diagnostics content ingestion or indexing
+- Public HTML directory scraping
+- Historical backfill beyond configured provenance roots
+- Optional build/campaign metadata ingestion
+
+## Core Decisions
+
+### Match diagnostics at case scope
+
+zppy runs against a full case output tree, not a single execution/LID. Use case identity as the primary join key:
+
+```text
+(case_name, machine, hpc_username)
+```
+
+All three fields are required. `case_name` alone is not globally safe, and `CASE_HASH` is not reliable across executions.
+
+### Do not parse public HTML directories
+
+Avoid public directory scraping. It is fragile, web-server-coupled, slow, and expands the SSRF/content-injection attack surface.
+
+### Use zppy provenance cfg as the primary input
+
+Do not require zppy to call the SimBoard API. zppy runs as a user-level HPC package, while SimBoard API writes require `SERVICE_ACCOUNT` or `ADMIN` tokens.
+
+Instead, SimBoard discovers zppy provenance files from configured filesystem roots. Newer zppy runs already emit provenance cfg files under diagnostics output paths, for example:
+
+```text
+post/scripts/provenance.20260303_230804_991619.cfg
+```
+
+Current cfg examples expose useful fields:
+
+- `case`: case name
+- `input`: case run directory
+- `output`: diagnostics filesystem root
+- `www`: public diagnostics root
+- `campaign`: optional campaign metadata
+
+But current cfg is not yet an authoritative join source because it may lack:
+
+- `machine`
+- execution `LID`
+- canonical simulation owner
+- unambiguous `hpc_username`
+
+Path-derived usernames are unsafe. Example ambiguity:
+
+```text
+input  path owner: ac.wlin
+output path owner: ac.zhang40
 ```
 
-zppy already knows `case_name`, `machine`, and the running user from its cfg. `mache` resolves machine-specific public web URL prefixes to construct the URLs.
+Therefore, zppy must enrich provenance cfg with required case identity copied from `case_scripts/env_case.xml`:
+
+| XML field  | Provenance field |
+| ---------- | ---------------- |
+| `CASE`     | `case_name`      |
+| `MACH`     | `machine`        |
+| `REALUSER` | `hpc_username`   |
+
+If any required field is missing, SimBoard skips the provenance file and logs it as invalid for linking.
+
+### Persist links, do not resolve at query time
+
+Create database rows when diagnostics are discovered. Frontend queries should not crawl filesystems or remote URLs.
+
+Use the existing manual-link rendering path where possible: diagnostics links become `ExternalLink` rows with `kind = diagnostic`.
+
+## Implementation
+
+Implement in order: provenance contract -> scanner -> storage target -> resolver/API -> frontend verification.
+
+### zppy
 
-### SimBoard discovers manifests via filesystem scanning (not API push)
+#### 1. Emit required provenance fields
 
-SimBoard already has a CronJob-based filesystem scanner (`nersc_archive_ingestor.py`) that:
+| Field          | Source                    |
+| -------------- | ------------------------- |
+| `case_name`    | `env_case.xml` `CASE`     |
+| `machine`      | `env_case.xml` `MACH`     |
+| `hpc_username` | `env_case.xml` `REALUSER` |
 
-- Walks mounted HPC directories every 15 min
-- Uses state-based incremental dedup
-- Authenticates with a single service account token
-- Calls `POST /ingestions/from-path` internally
+Tests:
 
-The diagnostics linking should follow the same pattern: a SimBoard-side scanner discovers `.simboard-diagnostics.json` manifests, reads them, matches to existing Cases, and creates `ExternalLink` rows. **No per-user tokens needed.**
+- emits `case_name`, `machine`, `hpc_username`
+- parses values from `env_case.xml`
+- handles missing `env_case.xml`
+- preserves existing provenance behavior
 
-### Persist links in database (not query-time resolution)
+### SimBoard
 
-Store `ExternalLink` rows on match. No remote calls during frontend queries. Same model as manually-added links — frontend works with zero changes. `ExternalLink.created_at` provides audit trail.
+#### 1. Add diagnostics scanner
 
-## Approach
+Add `diagnostics_link_scanner.py`.
 
-1. **Join key:** `(case_name, machine, hpc_username)` — all three required. `case_name` alone is not globally unique (`Case.name` has a unique index but different users can reuse names). Adding `machine` + `hpc_username` disambiguates. `CASE_HASH` is unreliable across executions (see issue #136). zppy has all three values.
+Responsibilities:
 
-2. **Matching query:** Machine is on `Simulation` not `Case`, so the resolver joins: `Case.name == X AND Simulation.machine_id == Y AND Simulation.hpc_username == Z`. All simulations in a case share the same machine in practice.
+- scan configured diagnostics roots for `provenance*.cfg`
+- dedup with state file
+- verify diagnostics completion
+- parse `case_name`, `machine`, `hpc_username`
+- call internal API with service-account auth
+- skip and log if full join key is unavailable
 
-3. **zppy-side (minimal change):** After diagnostics complete, zppy writes `.simboard-diagnostics.json` to its output directory. `mache` resolves public URL prefixes. No API call, no token.
+Tests:
 
-4. **SimBoard diagnostics scanner** — two options:
+- discovers cfgs
+- parses required cfg identity
+- handles malformed cfgs
+- skips missing identity
+- checks completion marker
+- dedups state
+- handles duplicate links idempotently
 
-   **Option A — Extend NERSC archive ingestor (recommended for MVP):** Add a post-scan phase to the existing `nersc_archive_ingestor.py` that also walks known diagnostics output directories (or the same archive tree) looking for `.simboard-diagnostics.json` files. On discovery, it calls a new internal endpoint or directly creates `ExternalLink` rows via the existing service account token.
+#### 2. Resolve link storage
 
-   **Option B — Separate diagnostics scanner script:** A new lightweight CronJob script (`diagnostics_link_scanner.py`) that walks diagnostics output directories. Same pattern as `nersc_archive_ingestor.py` — env-configured, state-file dedup, service account auth. Better separation of concerns, but more operational overhead.
+Add `DiagnosticsLinkRequest` in `backend/app/features/simulation/schemas.py`.
 
-5. **API endpoint** (extend `backend/app/features/simulation/api.py`):
+Storage options:
 
-   ```
-   POST /api/v1/diagnostics/link
-   Body: { "case_name": "...", "machine": "...", "hpc_username": "...", "diagnostics": [...] }
-   ```
+1. Preferred: add `case_id` to `ExternalLink`.
+2. Alternative: add `CaseLink`.
+3. Shortcut: attach to reference simulation.
 
-   Restricted to `ADMIN` / `SERVICE_ACCOUNT` roles (same as ingestion). Resolves the triple → `Case` → creates `ExternalLink` rows with `kind = diagnostic`. The scanner calls this endpoint; users don't call it directly.
+#### 3. Add matching resolver
 
-6. **Schema:** Add `DiagnosticsLinkRequest` to `backend/app/features/simulation/schemas.py`.
+| Input          | Match                     |
+| -------------- | ------------------------- |
+| `case_name`    | `Case.name`               |
+| `machine`      | `Simulation.machine_id`   |
+| `hpc_username` | `Simulation.hpc_username` |
 
-7. **Migration:** None if linking to existing FK targets. Required if adding `case_id` FK to `ExternalLink` (see open question #1).
+Outcomes:
 
-8. **Frontend:** No changes. Existing `grouped_links` rendering picks up new diagnostic links automatically.
+- 1 match: create/update links
+- 0 matches: `404`
+- multiple matches: `409`
 
-### Alternative: Convention-based URL derivation (no zppy changes)
+Tests:
 
-For production runs with enforced path conventions, SimBoard could derive diagnostic URLs from simulation metadata + `mache` without any zppy changes or manifest files. Per issue #174, zppy outputs follow a fixed directory structure and `mache` resolves per-machine URL prefixes.
+- matching triple creates links
+- same case/machine under different user does not cross-link
+- no match returns `404`
+- ambiguous match returns `409`
 
-This works only when path conventions are strict. The manifest approach is more robust for custom user paths. Could combine both: derive URLs for production campaigns, manifest for custom runs.
+#### 4. Add internal API endpoint
 
-## Tests
+Endpoint: `POST /api/v1/diagnostics/link`
 
-- `backend/tests/features/simulation/test_api.py` — endpoint tests:
-  - Happy path: matching `(case_name, machine, hpc_username)` → links created
-  - Different user, same case_name + machine → no cross-linking (isolation test)
-  - No matching case → 404
-  - Duplicate link idempotency
-  - Invalid payload → 422
-- Scanner tests: manifest discovery, state dedup, malformed manifest handling
-- Run: `make backend-test && make pre-commit-run`
+Roles: `ADMIN`, `SERVICE_ACCOUNT`
 
-## Risk
+Request:
 
-**Score: 3 (normal)**
+| Field          | Required |
+| -------------- | -------- |
+| `case_name`    | yes      |
+| `machine`      | yes      |
+| `hpc_username` | yes      |
+| `diagnostics`  | yes      |
 
-1. **zppy adoption lag** — No data until zppy emits manifests. Mitigate with convention-based derivation for production runs.
-2. **Case name collision** — `Case.name` is unique in DB but not globally meaningful. The `(case_name, machine, hpc_username)` triple mitigates. Broader fix tracked in issue #136.
-3. **Diagnostics output path visibility** — Scanner must have filesystem access to zppy output directories. On NERSC this requires mounting the relevant CFS paths into the SimBoard container (same pattern as performance archive).
-4. **Timing gap** — Scanner-based approach has up to 15-min latency. Acceptable for diagnostics linking.
+Diagnostics item:
 
-## Open Questions (ask colleagues)
+| Field               | Required |
+| ------------------- | -------- |
+| `name`              | yes      |
+| `url`               | yes      |
+| `kind = diagnostic` | yes      |
+
+Tests:
+
+- duplicate request is idempotent
+- invalid payload returns `422`
+- auth required
+
+#### 5. Keep frontend unchanged
+
+Existing external-link rendering should display diagnostic links once rows exist.
+
+## Fallbacks
+
+### Curated backfill
+
+Allow convention-based URL derivation only for controlled campaigns. Do not use as the primary MVP path.
+
+### Validation command
+
+```bash
+make backend-test && make pre-commit-run
+```
 
-1. **Case-level vs execution-level linking?** zppy diagnostics run across simulation output in time increments — they're inherently case-scoped, not tied to a specific execution/LID. Current `ExternalLink` only has `simulation_id` FK. Options: (a) add optional `case_id` FK to `ExternalLink`, (b) create a separate `CaseLink` model, (c) link to reference simulation only as a pragmatic shortcut. This is the key schema decision.
-2. **Case uniqueness long-term?** The `(case_name, machine, hpc_username)` triple is a pragmatic join key but `Case.name` as the sole DB unique constraint is fragile. Issue #136 is evaluating `CASE_HASH` but it's unstable across executions. Should Case uniqueness be strengthened in the model itself?
-3. **Diagnostics output directory location?** Where are zppy outputs stored on each machine? Need the path pattern to configure the scanner. Per issue #174, the coupled group stores results on machine web servers — need the exact filesystem mount paths for NERSC (and other machines if applicable).
-4. **Retroactive linking needed?** If yes, plan a one-time bulk-linking script (or convention-based derivation) for existing diagnostics that predate this feature.
-5. **Convention-based derivation viable for MVP?** If zppy output paths are predictable enough from `(case_name, machine, hpc_username)` + `mache`, SimBoard could derive diagnostic URLs without any zppy changes. Worth evaluating as a faster MVP path.
+## Risks
+
+- **Storage gap**: diagnostics are case-scoped, but `ExternalLink` currently points at `simulation_id`.
+  Mitigation: decide storage target before implementing resolver/API behavior.
+- **Missing identity**: SimBoard cannot link a provenance file without `case_name`, `machine`, and `hpc_username`.
+  Mitigation: require zppy provenance enrichment; skip and log invalid files.
+- **Deployment variability**: zppy roots and public URL prefixes vary by machine/campaign.
+  Mitigation: use env-configured scanner roots and machine/public-prefix mappings.
+- **Provenance drift**: cfg layout and required-field coverage may vary across zppy versions.
+  Mitigation: add parser tests, schema/version detection, and a documented support window.
+
+## Remaining Open Questions
+
+1. **Storage target:** Should diagnostics links attach to `Case`, `Simulation`, or a new link table?
+2. **Provenance schema:** Should zppy emit a versioned normalized block or reuse existing top-level cfg fields?
+3. **Completion signal:** Which artifact should SimBoard treat as authoritative completion: status file, generated index, or explicit provenance field?
+4. **Deployment scope:** Which scanner roots, machines, and public URL prefixes are supported in MVP?
+5. **Retroactive linking:** Does MVP include historical backfill, or only provenance files with the required join key?
+6. **Case identity hardening:** Is `(case_name, machine, hpc_username)` sufficient until issue #136 is resolved?

From d9dbc6e67b958f9d612853ab7fa75e581f287fa8 Mon Sep 17 00:00:00 2001
From: Vo <tomvothecoder@gmail.com>
Date: Thu, 14 May 2026 11:33:18 -0700
Subject: [PATCH 3/5] Update pland

---
 backend/docs/174-zppy-links/plan.md | 88 +++++++++++++++++------------
 1 file changed, 53 insertions(+), 35 deletions(-)

diff --git a/backend/docs/174-zppy-links/plan.md b/backend/docs/174-zppy-links/plan.md
index a9084863..beb3acd8 100644
--- a/backend/docs/174-zppy-links/plan.md
+++ b/backend/docs/174-zppy-links/plan.md
@@ -4,15 +4,18 @@
 
 Replace manual diagnostics URL entry with automated linking from zppy diagnostics outputs to existing SimBoard simulation records.
 
+MVP is NERSC-only.
+
 ## Scope
 
 ### In
 
 - Add required zppy provenance fields: `case_name`, `machine`, `hpc_username`
-- Discover zppy diagnostics provenance files from configured filesystem roots
-- Confirm diagnostics completion before linking
+- Add required diagnostics URLs in zppy provenance
+- Discover zppy diagnostics provenance files from configured NERSC filesystem roots
+- Confirm diagnostics completion from index page plus status files
 - Match diagnostics to SimBoard records using `(case_name, machine, hpc_username)`
-- Create idempotent diagnostic `ExternalLink` rows
+- Create idempotent case-scoped diagnostic links
 - Maintain scanner state to avoid repeated processing
 
 ### Out
@@ -24,7 +27,7 @@ Replace manual diagnostics URL entry with automated linking from zppy diagnostic
 - Diagnostics content ingestion or indexing
 - Public HTML directory scraping
 - Historical backfill beyond configured provenance roots
-- Optional build/campaign metadata ingestion
+- Non-NERSC deployments
 
 ## Core Decisions
 
@@ -44,14 +47,17 @@ Avoid public directory scraping. It is fragile, web-server-coupled, slow, and ex
 
 ### Use zppy provenance cfg as the primary input
 
-Do not require zppy to call the SimBoard API. zppy runs as a user-level HPC package, while SimBoard API writes require `SERVICE_ACCOUNT` or `ADMIN` tokens.
-
-Instead, SimBoard discovers zppy provenance files from configured filesystem roots. Newer zppy runs already emit provenance cfg files under diagnostics output paths, for example:
+SimBoard discovers zppy provenance files from configured NERSC filesystem roots. Newer zppy runs already emit provenance cfg files under diagnostics output paths, for example:
 
 ```text
 post/scripts/provenance.20260303_230804_991619.cfg
 ```
 
+Reference example:
+
+- https://github.com/E3SM-Project/zppy/blob/main/examples/post.v3.LR.historical.zppy_v3.cfg
+- https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/zppy_example/v3.2.0/v3.LR.historical_0051/provenance.20260303_230804_991619.cfg
+
 Current cfg examples expose useful fields:
 
 - `case`: case name
@@ -63,7 +69,6 @@ Current cfg examples expose useful fields:
 But current cfg is not yet an authoritative join source because it may lack:
 
 - `machine`
-- execution `LID`
 - canonical simulation owner
 - unambiguous `hpc_username`
 
@@ -74,7 +79,7 @@ input  path owner: ac.wlin
 output path owner: ac.zhang40
 ```
 
-Therefore, zppy must enrich provenance cfg with required case identity copied from `case_scripts/env_case.xml`:
+Therefore, zppy must enrich provenance cfg with required case identity copied from `<input>/case_scripts/env_case.xml`:
 
 | XML field  | Provenance field |
 | ---------- | ---------------- |
@@ -84,11 +89,21 @@ Therefore, zppy must enrich provenance cfg with required case identity copied fr
 
 If any required field is missing, SimBoard skips the provenance file and logs it as invalid for linking.
 
+For MVP, zppy should reuse existing top-level cfg fields rather than emit a new versioned normalized block.
+
+### Require explicit diagnostics URLs in provenance
+
+For MVP, SimBoard should not derive diagnostics URLs from path conventions. zppy should emit explicit diagnostics URLs in provenance cfg.
+
+### Use index page plus status files as completion signal
+
+Treat diagnostics as complete only when the expected index page and zppy status files are present.
+
 ### Persist links, do not resolve at query time
 
 Create database rows when diagnostics are discovered. Frontend queries should not crawl filesystems or remote URLs.
 
-Use the existing manual-link rendering path where possible: diagnostics links become `ExternalLink` rows with `kind = diagnostic`.
+Diagnostic links are case-scoped. For MVP, store them on `Case` by adding `case_id` to `ExternalLink`. Keep the existing manual-link rendering path where possible by surfacing case-scoped diagnostic links alongside current links.
 
 ## Implementation
 
@@ -104,11 +119,20 @@ Implement in order: provenance contract -> scanner -> storage target -> resolver
 | `machine`      | `env_case.xml` `MACH`     |
 | `hpc_username` | `env_case.xml` `REALUSER` |
 
+Implementation note:
+
+- For NERSC MVP, zppy can construct explicit diagnostics URLs from cfg `www` plus `mache` machine metadata.
+- `mache.MachineInfo` exposes helpers such as `web_portal_base`, `web_portal_url`, and `username`.
+- Reference: https://docs.e3sm.org/mache/main/developers_guide/generated/mache.MachineInfo.html
+
 Tests:
 
 - emits `case_name`, `machine`, `hpc_username`
+- emits explicit diagnostics URLs
+- can construct explicit diagnostics URLs from cfg `www` plus `mache` machine metadata
 - parses values from `env_case.xml`
-- handles missing `env_case.xml`
+- parses values from `env_build.xml`
+- handles missing `env_case.xml` or `env_build.xml`
 - preserves existing provenance behavior
 
 ### SimBoard
@@ -119,10 +143,11 @@ Add `diagnostics_link_scanner.py`.
 
 Responsibilities:
 
-- scan configured diagnostics roots for `provenance*.cfg`
+- scan configured NERSC diagnostics roots for `provenance*.cfg`
 - dedup with state file
-- verify diagnostics completion
+- verify diagnostics completion from index page plus status files
 - parse `case_name`, `machine`, `hpc_username`
+- parse explicit diagnostics URLs
 - call internal API with service-account auth
 - skip and log if full join key is unavailable
 
@@ -132,7 +157,7 @@ Tests:
 - parses required cfg identity
 - handles malformed cfgs
 - skips missing identity
-- checks completion marker
+- checks index-plus-status completion marker
 - dedups state
 - handles duplicate links idempotently
 
@@ -140,23 +165,19 @@ Tests:
 
 Add `DiagnosticsLinkRequest` in `backend/app/features/simulation/schemas.py`.
 
-Storage options:
-
-1. Preferred: add `case_id` to `ExternalLink`.
-2. Alternative: add `CaseLink`.
-3. Shortcut: attach to reference simulation.
+For MVP, add `case_id` to `ExternalLink` and store diagnostic links at case scope.
 
 #### 3. Add matching resolver
 
-| Input          | Match                     |
-| -------------- | ------------------------- |
-| `case_name`    | `Case.name`               |
-| `machine`      | `Simulation.machine_id`   |
-| `hpc_username` | `Simulation.hpc_username` |
+| Input          | Match                   |
+| -------------- | ----------------------- |
+| `case_name`    | `Case.name`             |
+| `machine`      | joined case simulations |
+| `hpc_username` | joined case simulations |
 
 Outcomes:
 
-- 1 match: create/update links
+- 1 case match: create/update case-scoped links
 - 0 matches: `404`
 - multiple matches: `409`
 
@@ -214,20 +235,17 @@ make backend-test && make pre-commit-run
 
 ## Risks
 
-- **Storage gap**: diagnostics are case-scoped, but `ExternalLink` currently points at `simulation_id`.
-  Mitigation: decide storage target before implementing resolver/API behavior.
+- **Case-scoped link migration**: diagnostics are case-scoped, but `ExternalLink` currently points at `simulation_id`.
+  Mitigation: add `case_id` for MVP and keep migration/API behavior narrow.
 - **Missing identity**: SimBoard cannot link a provenance file without `case_name`, `machine`, and `hpc_username`.
   Mitigation: require zppy provenance enrichment; skip and log invalid files.
-- **Deployment variability**: zppy roots and public URL prefixes vary by machine/campaign.
-  Mitigation: use env-configured scanner roots and machine/public-prefix mappings.
+- **NERSC deployment variability**: zppy roots and public URL prefixes may still vary by campaign or user layout within NERSC.
+  Mitigation: use env-configured NERSC scanner roots and NERSC public-prefix mappings.
 - **Provenance drift**: cfg layout and required-field coverage may vary across zppy versions.
   Mitigation: add parser tests, schema/version detection, and a documented support window.
 
 ## Remaining Open Questions
 
-1. **Storage target:** Should diagnostics links attach to `Case`, `Simulation`, or a new link table?
-2. **Provenance schema:** Should zppy emit a versioned normalized block or reuse existing top-level cfg fields?
-3. **Completion signal:** Which artifact should SimBoard treat as authoritative completion: status file, generated index, or explicit provenance field?
-4. **Deployment scope:** Which scanner roots, machines, and public URL prefixes are supported in MVP?
-5. **Retroactive linking:** Does MVP include historical backfill, or only provenance files with the required join key?
-6. **Case identity hardening:** Is `(case_name, machine, hpc_username)` sufficient until issue #136 is resolved?
+1. **NERSC deployment scope:** Which NERSC scanner roots and public URL prefixes are supported in MVP?
+2. **Retroactive linking:** Does MVP include historical backfill, or only provenance files with the required join key?
+3. **Case identity hardening:** Is `(case_name, machine, hpc_username)` sufficient until issue #136 is resolved?

From 1d6e33c100530f59104596e8e228206959eef463 Mon Sep 17 00:00:00 2001
From: Vo <tomvothecoder@gmail.com>
Date: Thu, 14 May 2026 11:44:38 -0700
Subject: [PATCH 4/5] Update plan

---
 backend/docs/174-zppy-links/plan.md | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/backend/docs/174-zppy-links/plan.md b/backend/docs/174-zppy-links/plan.md
index beb3acd8..2072e289 100644
--- a/backend/docs/174-zppy-links/plan.md
+++ b/backend/docs/174-zppy-links/plan.md
@@ -12,7 +12,8 @@ MVP is NERSC-only.
 
 - Add required zppy provenance fields: `case_name`, `machine`, `hpc_username`
 - Add required diagnostics URLs in zppy provenance
-- Discover zppy diagnostics provenance files from configured NERSC filesystem roots
+- Require standardized zppy diagnostics output locations for NERSC production runs
+- Discover zppy diagnostics provenance files from configured NERSC production filesystem roots
 - Confirm diagnostics completion from index page plus status files
 - Match diagnostics to SimBoard records using `(case_name, machine, hpc_username)`
 - Create idempotent case-scoped diagnostic links
@@ -91,6 +92,12 @@ If any required field is missing, SimBoard skips the provenance file and logs it
 
 For MVP, zppy should reuse existing top-level cfg fields rather than emit a new versioned normalized block.
 
+### Require standardized output locations for production runs
+
+For MVP, NERSC production runs must use standardized zppy diagnostics output locations. SimBoard relies on those known production roots for provenance discovery.
+
+Custom or ad hoc layouts do not block the overall design, but they are not the required path for MVP.
+
 ### Require explicit diagnostics URLs in provenance
 
 For MVP, SimBoard should not derive diagnostics URLs from path conventions. zppy should emit explicit diagnostics URLs in provenance cfg.
@@ -113,6 +120,8 @@ Implement in order: provenance contract -> scanner -> storage target -> resolver
 
 #### 1. Emit required provenance fields
 
+For MVP, production runs must write diagnostics outputs and provenance cfg files to the standardized NERSC zppy output locations.
+
 | Field          | Source                    |
 | -------------- | ------------------------- |
 | `case_name`    | `env_case.xml` `CASE`     |
@@ -127,6 +136,7 @@ Implementation note:
 
 Tests:
 
+- uses standardized NERSC production output locations
 - emits `case_name`, `machine`, `hpc_username`
 - emits explicit diagnostics URLs
 - can construct explicit diagnostics URLs from cfg `www` plus `mache` machine metadata
@@ -143,7 +153,7 @@ Add `diagnostics_link_scanner.py`.
 
 Responsibilities:
 
-- scan configured NERSC diagnostics roots for `provenance*.cfg`
+- scan configured NERSC production diagnostics roots for `provenance*.cfg`
 - dedup with state file
 - verify diagnostics completion from index page plus status files
 - parse `case_name`, `machine`, `hpc_username`

From 032ae1fae4781bced418e5e515e3a384556c4113 Mon Sep 17 00:00:00 2001
From: Tom Vo <tomvothecoder@gmail.com>
Date: Thu, 14 May 2026 11:46:15 -0700
Subject: [PATCH 5/5] Update plan.md to remove PACE integration changes

Removed PACE integration changes from the plan.
---
 backend/docs/174-zppy-links/plan.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/backend/docs/174-zppy-links/plan.md b/backend/docs/174-zppy-links/plan.md
index 2072e289..d17d10d5 100644
--- a/backend/docs/174-zppy-links/plan.md
+++ b/backend/docs/174-zppy-links/plan.md
@@ -23,7 +23,6 @@ MVP is NERSC-only.
 
 - Frontend redesign
 - Changes to manual external-link workflows
-- PACE integration changes
 - Case identity or uniqueness refactor
 - Diagnostics content ingestion or indexing
 - Public HTML directory scraping