From 455640daf860c36faf14050c0b763cbc580dba25 Mon Sep 17 00:00:00 2001 From: Vo Date: Wed, 13 May 2026 14:28:31 -0700 Subject: [PATCH 1/5] Add live plan for zppy links feature --- backend/docs/174-zppy-links/plan.md | 126 ++++++++++++++++++++++++++++ 1 file changed, 126 insertions(+) create mode 100644 backend/docs/174-zppy-links/plan.md diff --git a/backend/docs/174-zppy-links/plan.md b/backend/docs/174-zppy-links/plan.md new file mode 100644 index 00000000..4070884c --- /dev/null +++ b/backend/docs/174-zppy-links/plan.md @@ -0,0 +1,126 @@ +# Plan: Connect zppy Diagnostics to SimBoard Simulations + +## Task + +Replace manual diagnostic URL-pasting with automated, metadata-based linking of zppy diagnostics outputs to SimBoard simulation records. + +## Scope + +**In:** matching strategy (`case_name` + `machine` + `hpc_username`), persistence model, discovery mechanism, zppy manifest spec. +**Out:** frontend UI redesign, PACE changes, existing manual link workflow, Case uniqueness refactor (issue #136), diagnostics data ingestion (Phase 2+). + +## Key Decisions + +### Do NOT parse public HTML directories + +- **Fragility** — HTML layouts vary by web server; breaks on config changes. +- **Security** — SSRF and content-injection attack surface. +- **Coupling** — SimBoard depends on external web server availability/structure. +- **Latency** — Network crawling is slow and unreliable for production. + +### zppy writes a manifest file (no API call) + +zppy runs as a user-level Python package on HPC machines. SimBoard's API requires `SERVICE_ACCOUNT` or `ADMIN` bearer tokens — it is not feasible for each user to obtain and configure an API token in zppy. + +**Instead, zppy writes a small manifest file to a well-known location within its output directory.** This requires zero authentication, zero network access from zppy, and trivial zppy-side changes. + +```jsonc +// /.simboard-diagnostics.json +{ + "case_name": "v3.LR.historical_0201", + "machine": "chrysalis", + "hpc_username": "user123", + "diagnostics": [ + { + "kind": "e3sm_diags", + "url": "https://web.lcrc.anl.gov/...", + "label": "E3SM Diags", + }, + { + "kind": "mpas_analysis", + "url": "https://web.lcrc.anl.gov/...", + "label": "MPAS-Analysis", + }, + ], +} +``` + +zppy already knows `case_name`, `machine`, and the running user from its cfg. `mache` resolves machine-specific public web URL prefixes to construct the URLs. + +### SimBoard discovers manifests via filesystem scanning (not API push) + +SimBoard already has a CronJob-based filesystem scanner (`nersc_archive_ingestor.py`) that: + +- Walks mounted HPC directories every 15 min +- Uses state-based incremental dedup +- Authenticates with a single service account token +- Calls `POST /ingestions/from-path` internally + +The diagnostics linking should follow the same pattern: a SimBoard-side scanner discovers `.simboard-diagnostics.json` manifests, reads them, matches to existing Cases, and creates `ExternalLink` rows. **No per-user tokens needed.** + +### Persist links in database (not query-time resolution) + +Store `ExternalLink` rows on match. No remote calls during frontend queries. Same model as manually-added links — frontend works with zero changes. `ExternalLink.created_at` provides audit trail. + +## Approach + +1. **Join key:** `(case_name, machine, hpc_username)` — all three required. `case_name` alone is not globally unique (`Case.name` has a unique index but different users can reuse names). Adding `machine` + `hpc_username` disambiguates. `CASE_HASH` is unreliable across executions (see issue #136). zppy has all three values. + +2. **Matching query:** Machine is on `Simulation` not `Case`, so the resolver joins: `Case.name == X AND Simulation.machine_id == Y AND Simulation.hpc_username == Z`. All simulations in a case share the same machine in practice. + +3. **zppy-side (minimal change):** After diagnostics complete, zppy writes `.simboard-diagnostics.json` to its output directory. `mache` resolves public URL prefixes. No API call, no token. + +4. **SimBoard diagnostics scanner** — two options: + + **Option A — Extend NERSC archive ingestor (recommended for MVP):** Add a post-scan phase to the existing `nersc_archive_ingestor.py` that also walks known diagnostics output directories (or the same archive tree) looking for `.simboard-diagnostics.json` files. On discovery, it calls a new internal endpoint or directly creates `ExternalLink` rows via the existing service account token. + + **Option B — Separate diagnostics scanner script:** A new lightweight CronJob script (`diagnostics_link_scanner.py`) that walks diagnostics output directories. Same pattern as `nersc_archive_ingestor.py` — env-configured, state-file dedup, service account auth. Better separation of concerns, but more operational overhead. + +5. **API endpoint** (extend `backend/app/features/simulation/api.py`): + + ``` + POST /api/v1/diagnostics/link + Body: { "case_name": "...", "machine": "...", "hpc_username": "...", "diagnostics": [...] } + ``` + + Restricted to `ADMIN` / `SERVICE_ACCOUNT` roles (same as ingestion). Resolves the triple → `Case` → creates `ExternalLink` rows with `kind = diagnostic`. The scanner calls this endpoint; users don't call it directly. + +6. **Schema:** Add `DiagnosticsLinkRequest` to `backend/app/features/simulation/schemas.py`. + +7. **Migration:** None if linking to existing FK targets. Required if adding `case_id` FK to `ExternalLink` (see open question #1). + +8. **Frontend:** No changes. Existing `grouped_links` rendering picks up new diagnostic links automatically. + +### Alternative: Convention-based URL derivation (no zppy changes) + +For production runs with enforced path conventions, SimBoard could derive diagnostic URLs from simulation metadata + `mache` without any zppy changes or manifest files. Per issue #174, zppy outputs follow a fixed directory structure and `mache` resolves per-machine URL prefixes. + +This works only when path conventions are strict. The manifest approach is more robust for custom user paths. Could combine both: derive URLs for production campaigns, manifest for custom runs. + +## Tests + +- `backend/tests/features/simulation/test_api.py` — endpoint tests: + - Happy path: matching `(case_name, machine, hpc_username)` → links created + - Different user, same case_name + machine → no cross-linking (isolation test) + - No matching case → 404 + - Duplicate link idempotency + - Invalid payload → 422 +- Scanner tests: manifest discovery, state dedup, malformed manifest handling +- Run: `make backend-test && make pre-commit-run` + +## Risk + +**Score: 3 (normal)** + +1. **zppy adoption lag** — No data until zppy emits manifests. Mitigate with convention-based derivation for production runs. +2. **Case name collision** — `Case.name` is unique in DB but not globally meaningful. The `(case_name, machine, hpc_username)` triple mitigates. Broader fix tracked in issue #136. +3. **Diagnostics output path visibility** — Scanner must have filesystem access to zppy output directories. On NERSC this requires mounting the relevant CFS paths into the SimBoard container (same pattern as performance archive). +4. **Timing gap** — Scanner-based approach has up to 15-min latency. Acceptable for diagnostics linking. + +## Open Questions (ask colleagues) + +1. **Case-level vs execution-level linking?** zppy diagnostics run across simulation output in time increments — they're inherently case-scoped, not tied to a specific execution/LID. Current `ExternalLink` only has `simulation_id` FK. Options: (a) add optional `case_id` FK to `ExternalLink`, (b) create a separate `CaseLink` model, (c) link to reference simulation only as a pragmatic shortcut. This is the key schema decision. +2. **Case uniqueness long-term?** The `(case_name, machine, hpc_username)` triple is a pragmatic join key but `Case.name` as the sole DB unique constraint is fragile. Issue #136 is evaluating `CASE_HASH` but it's unstable across executions. Should Case uniqueness be strengthened in the model itself? +3. **Diagnostics output directory location?** Where are zppy outputs stored on each machine? Need the path pattern to configure the scanner. Per issue #174, the coupled group stores results on machine web servers — need the exact filesystem mount paths for NERSC (and other machines if applicable). +4. **Retroactive linking needed?** If yes, plan a one-time bulk-linking script (or convention-based derivation) for existing diagnostics that predate this feature. +5. **Convention-based derivation viable for MVP?** If zppy output paths are predictable enough from `(case_name, machine, hpc_username)` + `mache`, SimBoard could derive diagnostic URLs without any zppy changes. Worth evaluating as a faster MVP path. From d498b7e12f70f9819619b8b66f5924a739079352 Mon Sep 17 00:00:00 2001 From: Vo Date: Thu, 14 May 2026 11:09:05 -0700 Subject: [PATCH 2/5] Update plan.md --- backend/docs/174-zppy-links/plan.md | 285 +++++++++++++++++++--------- 1 file changed, 196 insertions(+), 89 deletions(-) diff --git a/backend/docs/174-zppy-links/plan.md b/backend/docs/174-zppy-links/plan.md index 4070884c..a9084863 100644 --- a/backend/docs/174-zppy-links/plan.md +++ b/backend/docs/174-zppy-links/plan.md @@ -1,126 +1,233 @@ # Plan: Connect zppy Diagnostics to SimBoard Simulations -## Task +## Goal -Replace manual diagnostic URL-pasting with automated, metadata-based linking of zppy diagnostics outputs to SimBoard simulation records. +Replace manual diagnostics URL entry with automated linking from zppy diagnostics outputs to existing SimBoard simulation records. ## Scope -**In:** matching strategy (`case_name` + `machine` + `hpc_username`), persistence model, discovery mechanism, zppy manifest spec. -**Out:** frontend UI redesign, PACE changes, existing manual link workflow, Case uniqueness refactor (issue #136), diagnostics data ingestion (Phase 2+). - -## Key Decisions - -### Do NOT parse public HTML directories - -- **Fragility** — HTML layouts vary by web server; breaks on config changes. -- **Security** — SSRF and content-injection attack surface. -- **Coupling** — SimBoard depends on external web server availability/structure. -- **Latency** — Network crawling is slow and unreliable for production. - -### zppy writes a manifest file (no API call) - -zppy runs as a user-level Python package on HPC machines. SimBoard's API requires `SERVICE_ACCOUNT` or `ADMIN` bearer tokens — it is not feasible for each user to obtain and configure an API token in zppy. - -**Instead, zppy writes a small manifest file to a well-known location within its output directory.** This requires zero authentication, zero network access from zppy, and trivial zppy-side changes. - -```jsonc -// /.simboard-diagnostics.json -{ - "case_name": "v3.LR.historical_0201", - "machine": "chrysalis", - "hpc_username": "user123", - "diagnostics": [ - { - "kind": "e3sm_diags", - "url": "https://web.lcrc.anl.gov/...", - "label": "E3SM Diags", - }, - { - "kind": "mpas_analysis", - "url": "https://web.lcrc.anl.gov/...", - "label": "MPAS-Analysis", - }, - ], -} +### In + +- Add required zppy provenance fields: `case_name`, `machine`, `hpc_username` +- Discover zppy diagnostics provenance files from configured filesystem roots +- Confirm diagnostics completion before linking +- Match diagnostics to SimBoard records using `(case_name, machine, hpc_username)` +- Create idempotent diagnostic `ExternalLink` rows +- Maintain scanner state to avoid repeated processing + +### Out + +- Frontend redesign +- Changes to manual external-link workflows +- PACE integration changes +- Case identity or uniqueness refactor +- Diagnostics content ingestion or indexing +- Public HTML directory scraping +- Historical backfill beyond configured provenance roots +- Optional build/campaign metadata ingestion + +## Core Decisions + +### Match diagnostics at case scope + +zppy runs against a full case output tree, not a single execution/LID. Use case identity as the primary join key: + +```text +(case_name, machine, hpc_username) +``` + +All three fields are required. `case_name` alone is not globally safe, and `CASE_HASH` is not reliable across executions. + +### Do not parse public HTML directories + +Avoid public directory scraping. It is fragile, web-server-coupled, slow, and expands the SSRF/content-injection attack surface. + +### Use zppy provenance cfg as the primary input + +Do not require zppy to call the SimBoard API. zppy runs as a user-level HPC package, while SimBoard API writes require `SERVICE_ACCOUNT` or `ADMIN` tokens. + +Instead, SimBoard discovers zppy provenance files from configured filesystem roots. Newer zppy runs already emit provenance cfg files under diagnostics output paths, for example: + +```text +post/scripts/provenance.20260303_230804_991619.cfg +``` + +Current cfg examples expose useful fields: + +- `case`: case name +- `input`: case run directory +- `output`: diagnostics filesystem root +- `www`: public diagnostics root +- `campaign`: optional campaign metadata + +But current cfg is not yet an authoritative join source because it may lack: + +- `machine` +- execution `LID` +- canonical simulation owner +- unambiguous `hpc_username` + +Path-derived usernames are unsafe. Example ambiguity: + +```text +input path owner: ac.wlin +output path owner: ac.zhang40 ``` -zppy already knows `case_name`, `machine`, and the running user from its cfg. `mache` resolves machine-specific public web URL prefixes to construct the URLs. +Therefore, zppy must enrich provenance cfg with required case identity copied from `case_scripts/env_case.xml`: + +| XML field | Provenance field | +| ---------- | ---------------- | +| `CASE` | `case_name` | +| `MACH` | `machine` | +| `REALUSER` | `hpc_username` | + +If any required field is missing, SimBoard skips the provenance file and logs it as invalid for linking. + +### Persist links, do not resolve at query time + +Create database rows when diagnostics are discovered. Frontend queries should not crawl filesystems or remote URLs. + +Use the existing manual-link rendering path where possible: diagnostics links become `ExternalLink` rows with `kind = diagnostic`. + +## Implementation + +Implement in order: provenance contract -> scanner -> storage target -> resolver/API -> frontend verification. + +### zppy -### SimBoard discovers manifests via filesystem scanning (not API push) +#### 1. Emit required provenance fields -SimBoard already has a CronJob-based filesystem scanner (`nersc_archive_ingestor.py`) that: +| Field | Source | +| -------------- | ------------------------- | +| `case_name` | `env_case.xml` `CASE` | +| `machine` | `env_case.xml` `MACH` | +| `hpc_username` | `env_case.xml` `REALUSER` | -- Walks mounted HPC directories every 15 min -- Uses state-based incremental dedup -- Authenticates with a single service account token -- Calls `POST /ingestions/from-path` internally +Tests: -The diagnostics linking should follow the same pattern: a SimBoard-side scanner discovers `.simboard-diagnostics.json` manifests, reads them, matches to existing Cases, and creates `ExternalLink` rows. **No per-user tokens needed.** +- emits `case_name`, `machine`, `hpc_username` +- parses values from `env_case.xml` +- handles missing `env_case.xml` +- preserves existing provenance behavior -### Persist links in database (not query-time resolution) +### SimBoard -Store `ExternalLink` rows on match. No remote calls during frontend queries. Same model as manually-added links — frontend works with zero changes. `ExternalLink.created_at` provides audit trail. +#### 1. Add diagnostics scanner -## Approach +Add `diagnostics_link_scanner.py`. -1. **Join key:** `(case_name, machine, hpc_username)` — all three required. `case_name` alone is not globally unique (`Case.name` has a unique index but different users can reuse names). Adding `machine` + `hpc_username` disambiguates. `CASE_HASH` is unreliable across executions (see issue #136). zppy has all three values. +Responsibilities: -2. **Matching query:** Machine is on `Simulation` not `Case`, so the resolver joins: `Case.name == X AND Simulation.machine_id == Y AND Simulation.hpc_username == Z`. All simulations in a case share the same machine in practice. +- scan configured diagnostics roots for `provenance*.cfg` +- dedup with state file +- verify diagnostics completion +- parse `case_name`, `machine`, `hpc_username` +- call internal API with service-account auth +- skip and log if full join key is unavailable -3. **zppy-side (minimal change):** After diagnostics complete, zppy writes `.simboard-diagnostics.json` to its output directory. `mache` resolves public URL prefixes. No API call, no token. +Tests: -4. **SimBoard diagnostics scanner** — two options: +- discovers cfgs +- parses required cfg identity +- handles malformed cfgs +- skips missing identity +- checks completion marker +- dedups state +- handles duplicate links idempotently - **Option A — Extend NERSC archive ingestor (recommended for MVP):** Add a post-scan phase to the existing `nersc_archive_ingestor.py` that also walks known diagnostics output directories (or the same archive tree) looking for `.simboard-diagnostics.json` files. On discovery, it calls a new internal endpoint or directly creates `ExternalLink` rows via the existing service account token. +#### 2. Resolve link storage - **Option B — Separate diagnostics scanner script:** A new lightweight CronJob script (`diagnostics_link_scanner.py`) that walks diagnostics output directories. Same pattern as `nersc_archive_ingestor.py` — env-configured, state-file dedup, service account auth. Better separation of concerns, but more operational overhead. +Add `DiagnosticsLinkRequest` in `backend/app/features/simulation/schemas.py`. -5. **API endpoint** (extend `backend/app/features/simulation/api.py`): +Storage options: - ``` - POST /api/v1/diagnostics/link - Body: { "case_name": "...", "machine": "...", "hpc_username": "...", "diagnostics": [...] } - ``` +1. Preferred: add `case_id` to `ExternalLink`. +2. Alternative: add `CaseLink`. +3. Shortcut: attach to reference simulation. - Restricted to `ADMIN` / `SERVICE_ACCOUNT` roles (same as ingestion). Resolves the triple → `Case` → creates `ExternalLink` rows with `kind = diagnostic`. The scanner calls this endpoint; users don't call it directly. +#### 3. Add matching resolver -6. **Schema:** Add `DiagnosticsLinkRequest` to `backend/app/features/simulation/schemas.py`. +| Input | Match | +| -------------- | ------------------------- | +| `case_name` | `Case.name` | +| `machine` | `Simulation.machine_id` | +| `hpc_username` | `Simulation.hpc_username` | -7. **Migration:** None if linking to existing FK targets. Required if adding `case_id` FK to `ExternalLink` (see open question #1). +Outcomes: -8. **Frontend:** No changes. Existing `grouped_links` rendering picks up new diagnostic links automatically. +- 1 match: create/update links +- 0 matches: `404` +- multiple matches: `409` -### Alternative: Convention-based URL derivation (no zppy changes) +Tests: -For production runs with enforced path conventions, SimBoard could derive diagnostic URLs from simulation metadata + `mache` without any zppy changes or manifest files. Per issue #174, zppy outputs follow a fixed directory structure and `mache` resolves per-machine URL prefixes. +- matching triple creates links +- same case/machine under different user does not cross-link +- no match returns `404` +- ambiguous match returns `409` -This works only when path conventions are strict. The manifest approach is more robust for custom user paths. Could combine both: derive URLs for production campaigns, manifest for custom runs. +#### 4. Add internal API endpoint -## Tests +Endpoint: `POST /api/v1/diagnostics/link` -- `backend/tests/features/simulation/test_api.py` — endpoint tests: - - Happy path: matching `(case_name, machine, hpc_username)` → links created - - Different user, same case_name + machine → no cross-linking (isolation test) - - No matching case → 404 - - Duplicate link idempotency - - Invalid payload → 422 -- Scanner tests: manifest discovery, state dedup, malformed manifest handling -- Run: `make backend-test && make pre-commit-run` +Roles: `ADMIN`, `SERVICE_ACCOUNT` -## Risk +Request: -**Score: 3 (normal)** +| Field | Required | +| -------------- | -------- | +| `case_name` | yes | +| `machine` | yes | +| `hpc_username` | yes | +| `diagnostics` | yes | -1. **zppy adoption lag** — No data until zppy emits manifests. Mitigate with convention-based derivation for production runs. -2. **Case name collision** — `Case.name` is unique in DB but not globally meaningful. The `(case_name, machine, hpc_username)` triple mitigates. Broader fix tracked in issue #136. -3. **Diagnostics output path visibility** — Scanner must have filesystem access to zppy output directories. On NERSC this requires mounting the relevant CFS paths into the SimBoard container (same pattern as performance archive). -4. **Timing gap** — Scanner-based approach has up to 15-min latency. Acceptable for diagnostics linking. +Diagnostics item: -## Open Questions (ask colleagues) +| Field | Required | +| ------------------- | -------- | +| `name` | yes | +| `url` | yes | +| `kind = diagnostic` | yes | + +Tests: + +- duplicate request is idempotent +- invalid payload returns `422` +- auth required + +#### 5. Keep frontend unchanged + +Existing external-link rendering should display diagnostic links once rows exist. + +## Fallbacks + +### Curated backfill + +Allow convention-based URL derivation only for controlled campaigns. Do not use as the primary MVP path. + +### Validation command + +```bash +make backend-test && make pre-commit-run +``` -1. **Case-level vs execution-level linking?** zppy diagnostics run across simulation output in time increments — they're inherently case-scoped, not tied to a specific execution/LID. Current `ExternalLink` only has `simulation_id` FK. Options: (a) add optional `case_id` FK to `ExternalLink`, (b) create a separate `CaseLink` model, (c) link to reference simulation only as a pragmatic shortcut. This is the key schema decision. -2. **Case uniqueness long-term?** The `(case_name, machine, hpc_username)` triple is a pragmatic join key but `Case.name` as the sole DB unique constraint is fragile. Issue #136 is evaluating `CASE_HASH` but it's unstable across executions. Should Case uniqueness be strengthened in the model itself? -3. **Diagnostics output directory location?** Where are zppy outputs stored on each machine? Need the path pattern to configure the scanner. Per issue #174, the coupled group stores results on machine web servers — need the exact filesystem mount paths for NERSC (and other machines if applicable). -4. **Retroactive linking needed?** If yes, plan a one-time bulk-linking script (or convention-based derivation) for existing diagnostics that predate this feature. -5. **Convention-based derivation viable for MVP?** If zppy output paths are predictable enough from `(case_name, machine, hpc_username)` + `mache`, SimBoard could derive diagnostic URLs without any zppy changes. Worth evaluating as a faster MVP path. +## Risks + +- **Storage gap**: diagnostics are case-scoped, but `ExternalLink` currently points at `simulation_id`. + Mitigation: decide storage target before implementing resolver/API behavior. +- **Missing identity**: SimBoard cannot link a provenance file without `case_name`, `machine`, and `hpc_username`. + Mitigation: require zppy provenance enrichment; skip and log invalid files. +- **Deployment variability**: zppy roots and public URL prefixes vary by machine/campaign. + Mitigation: use env-configured scanner roots and machine/public-prefix mappings. +- **Provenance drift**: cfg layout and required-field coverage may vary across zppy versions. + Mitigation: add parser tests, schema/version detection, and a documented support window. + +## Remaining Open Questions + +1. **Storage target:** Should diagnostics links attach to `Case`, `Simulation`, or a new link table? +2. **Provenance schema:** Should zppy emit a versioned normalized block or reuse existing top-level cfg fields? +3. **Completion signal:** Which artifact should SimBoard treat as authoritative completion: status file, generated index, or explicit provenance field? +4. **Deployment scope:** Which scanner roots, machines, and public URL prefixes are supported in MVP? +5. **Retroactive linking:** Does MVP include historical backfill, or only provenance files with the required join key? +6. **Case identity hardening:** Is `(case_name, machine, hpc_username)` sufficient until issue #136 is resolved? From d9dbc6e67b958f9d612853ab7fa75e581f287fa8 Mon Sep 17 00:00:00 2001 From: Vo Date: Thu, 14 May 2026 11:33:18 -0700 Subject: [PATCH 3/5] Update pland --- backend/docs/174-zppy-links/plan.md | 88 +++++++++++++++++------------ 1 file changed, 53 insertions(+), 35 deletions(-) diff --git a/backend/docs/174-zppy-links/plan.md b/backend/docs/174-zppy-links/plan.md index a9084863..beb3acd8 100644 --- a/backend/docs/174-zppy-links/plan.md +++ b/backend/docs/174-zppy-links/plan.md @@ -4,15 +4,18 @@ Replace manual diagnostics URL entry with automated linking from zppy diagnostics outputs to existing SimBoard simulation records. +MVP is NERSC-only. + ## Scope ### In - Add required zppy provenance fields: `case_name`, `machine`, `hpc_username` -- Discover zppy diagnostics provenance files from configured filesystem roots -- Confirm diagnostics completion before linking +- Add required diagnostics URLs in zppy provenance +- Discover zppy diagnostics provenance files from configured NERSC filesystem roots +- Confirm diagnostics completion from index page plus status files - Match diagnostics to SimBoard records using `(case_name, machine, hpc_username)` -- Create idempotent diagnostic `ExternalLink` rows +- Create idempotent case-scoped diagnostic links - Maintain scanner state to avoid repeated processing ### Out @@ -24,7 +27,7 @@ Replace manual diagnostics URL entry with automated linking from zppy diagnostic - Diagnostics content ingestion or indexing - Public HTML directory scraping - Historical backfill beyond configured provenance roots -- Optional build/campaign metadata ingestion +- Non-NERSC deployments ## Core Decisions @@ -44,14 +47,17 @@ Avoid public directory scraping. It is fragile, web-server-coupled, slow, and ex ### Use zppy provenance cfg as the primary input -Do not require zppy to call the SimBoard API. zppy runs as a user-level HPC package, while SimBoard API writes require `SERVICE_ACCOUNT` or `ADMIN` tokens. - -Instead, SimBoard discovers zppy provenance files from configured filesystem roots. Newer zppy runs already emit provenance cfg files under diagnostics output paths, for example: +SimBoard discovers zppy provenance files from configured NERSC filesystem roots. Newer zppy runs already emit provenance cfg files under diagnostics output paths, for example: ```text post/scripts/provenance.20260303_230804_991619.cfg ``` +Reference example: + +- https://github.com/E3SM-Project/zppy/blob/main/examples/post.v3.LR.historical.zppy_v3.cfg +- https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/zppy_example/v3.2.0/v3.LR.historical_0051/provenance.20260303_230804_991619.cfg + Current cfg examples expose useful fields: - `case`: case name @@ -63,7 +69,6 @@ Current cfg examples expose useful fields: But current cfg is not yet an authoritative join source because it may lack: - `machine` -- execution `LID` - canonical simulation owner - unambiguous `hpc_username` @@ -74,7 +79,7 @@ input path owner: ac.wlin output path owner: ac.zhang40 ``` -Therefore, zppy must enrich provenance cfg with required case identity copied from `case_scripts/env_case.xml`: +Therefore, zppy must enrich provenance cfg with required case identity copied from `/case_scripts/env_case.xml`: | XML field | Provenance field | | ---------- | ---------------- | @@ -84,11 +89,21 @@ Therefore, zppy must enrich provenance cfg with required case identity copied fr If any required field is missing, SimBoard skips the provenance file and logs it as invalid for linking. +For MVP, zppy should reuse existing top-level cfg fields rather than emit a new versioned normalized block. + +### Require explicit diagnostics URLs in provenance + +For MVP, SimBoard should not derive diagnostics URLs from path conventions. zppy should emit explicit diagnostics URLs in provenance cfg. + +### Use index page plus status files as completion signal + +Treat diagnostics as complete only when the expected index page and zppy status files are present. + ### Persist links, do not resolve at query time Create database rows when diagnostics are discovered. Frontend queries should not crawl filesystems or remote URLs. -Use the existing manual-link rendering path where possible: diagnostics links become `ExternalLink` rows with `kind = diagnostic`. +Diagnostic links are case-scoped. For MVP, store them on `Case` by adding `case_id` to `ExternalLink`. Keep the existing manual-link rendering path where possible by surfacing case-scoped diagnostic links alongside current links. ## Implementation @@ -104,11 +119,20 @@ Implement in order: provenance contract -> scanner -> storage target -> resolver | `machine` | `env_case.xml` `MACH` | | `hpc_username` | `env_case.xml` `REALUSER` | +Implementation note: + +- For NERSC MVP, zppy can construct explicit diagnostics URLs from cfg `www` plus `mache` machine metadata. +- `mache.MachineInfo` exposes helpers such as `web_portal_base`, `web_portal_url`, and `username`. +- Reference: https://docs.e3sm.org/mache/main/developers_guide/generated/mache.MachineInfo.html + Tests: - emits `case_name`, `machine`, `hpc_username` +- emits explicit diagnostics URLs +- can construct explicit diagnostics URLs from cfg `www` plus `mache` machine metadata - parses values from `env_case.xml` -- handles missing `env_case.xml` +- parses values from `env_build.xml` +- handles missing `env_case.xml` or `env_build.xml` - preserves existing provenance behavior ### SimBoard @@ -119,10 +143,11 @@ Add `diagnostics_link_scanner.py`. Responsibilities: -- scan configured diagnostics roots for `provenance*.cfg` +- scan configured NERSC diagnostics roots for `provenance*.cfg` - dedup with state file -- verify diagnostics completion +- verify diagnostics completion from index page plus status files - parse `case_name`, `machine`, `hpc_username` +- parse explicit diagnostics URLs - call internal API with service-account auth - skip and log if full join key is unavailable @@ -132,7 +157,7 @@ Tests: - parses required cfg identity - handles malformed cfgs - skips missing identity -- checks completion marker +- checks index-plus-status completion marker - dedups state - handles duplicate links idempotently @@ -140,23 +165,19 @@ Tests: Add `DiagnosticsLinkRequest` in `backend/app/features/simulation/schemas.py`. -Storage options: - -1. Preferred: add `case_id` to `ExternalLink`. -2. Alternative: add `CaseLink`. -3. Shortcut: attach to reference simulation. +For MVP, add `case_id` to `ExternalLink` and store diagnostic links at case scope. #### 3. Add matching resolver -| Input | Match | -| -------------- | ------------------------- | -| `case_name` | `Case.name` | -| `machine` | `Simulation.machine_id` | -| `hpc_username` | `Simulation.hpc_username` | +| Input | Match | +| -------------- | ----------------------- | +| `case_name` | `Case.name` | +| `machine` | joined case simulations | +| `hpc_username` | joined case simulations | Outcomes: -- 1 match: create/update links +- 1 case match: create/update case-scoped links - 0 matches: `404` - multiple matches: `409` @@ -214,20 +235,17 @@ make backend-test && make pre-commit-run ## Risks -- **Storage gap**: diagnostics are case-scoped, but `ExternalLink` currently points at `simulation_id`. - Mitigation: decide storage target before implementing resolver/API behavior. +- **Case-scoped link migration**: diagnostics are case-scoped, but `ExternalLink` currently points at `simulation_id`. + Mitigation: add `case_id` for MVP and keep migration/API behavior narrow. - **Missing identity**: SimBoard cannot link a provenance file without `case_name`, `machine`, and `hpc_username`. Mitigation: require zppy provenance enrichment; skip and log invalid files. -- **Deployment variability**: zppy roots and public URL prefixes vary by machine/campaign. - Mitigation: use env-configured scanner roots and machine/public-prefix mappings. +- **NERSC deployment variability**: zppy roots and public URL prefixes may still vary by campaign or user layout within NERSC. + Mitigation: use env-configured NERSC scanner roots and NERSC public-prefix mappings. - **Provenance drift**: cfg layout and required-field coverage may vary across zppy versions. Mitigation: add parser tests, schema/version detection, and a documented support window. ## Remaining Open Questions -1. **Storage target:** Should diagnostics links attach to `Case`, `Simulation`, or a new link table? -2. **Provenance schema:** Should zppy emit a versioned normalized block or reuse existing top-level cfg fields? -3. **Completion signal:** Which artifact should SimBoard treat as authoritative completion: status file, generated index, or explicit provenance field? -4. **Deployment scope:** Which scanner roots, machines, and public URL prefixes are supported in MVP? -5. **Retroactive linking:** Does MVP include historical backfill, or only provenance files with the required join key? -6. **Case identity hardening:** Is `(case_name, machine, hpc_username)` sufficient until issue #136 is resolved? +1. **NERSC deployment scope:** Which NERSC scanner roots and public URL prefixes are supported in MVP? +2. **Retroactive linking:** Does MVP include historical backfill, or only provenance files with the required join key? +3. **Case identity hardening:** Is `(case_name, machine, hpc_username)` sufficient until issue #136 is resolved? From 1d6e33c100530f59104596e8e228206959eef463 Mon Sep 17 00:00:00 2001 From: Vo Date: Thu, 14 May 2026 11:44:38 -0700 Subject: [PATCH 4/5] Update plan --- backend/docs/174-zppy-links/plan.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/backend/docs/174-zppy-links/plan.md b/backend/docs/174-zppy-links/plan.md index beb3acd8..2072e289 100644 --- a/backend/docs/174-zppy-links/plan.md +++ b/backend/docs/174-zppy-links/plan.md @@ -12,7 +12,8 @@ MVP is NERSC-only. - Add required zppy provenance fields: `case_name`, `machine`, `hpc_username` - Add required diagnostics URLs in zppy provenance -- Discover zppy diagnostics provenance files from configured NERSC filesystem roots +- Require standardized zppy diagnostics output locations for NERSC production runs +- Discover zppy diagnostics provenance files from configured NERSC production filesystem roots - Confirm diagnostics completion from index page plus status files - Match diagnostics to SimBoard records using `(case_name, machine, hpc_username)` - Create idempotent case-scoped diagnostic links @@ -91,6 +92,12 @@ If any required field is missing, SimBoard skips the provenance file and logs it For MVP, zppy should reuse existing top-level cfg fields rather than emit a new versioned normalized block. +### Require standardized output locations for production runs + +For MVP, NERSC production runs must use standardized zppy diagnostics output locations. SimBoard relies on those known production roots for provenance discovery. + +Custom or ad hoc layouts do not block the overall design, but they are not the required path for MVP. + ### Require explicit diagnostics URLs in provenance For MVP, SimBoard should not derive diagnostics URLs from path conventions. zppy should emit explicit diagnostics URLs in provenance cfg. @@ -113,6 +120,8 @@ Implement in order: provenance contract -> scanner -> storage target -> resolver #### 1. Emit required provenance fields +For MVP, production runs must write diagnostics outputs and provenance cfg files to the standardized NERSC zppy output locations. + | Field | Source | | -------------- | ------------------------- | | `case_name` | `env_case.xml` `CASE` | @@ -127,6 +136,7 @@ Implementation note: Tests: +- uses standardized NERSC production output locations - emits `case_name`, `machine`, `hpc_username` - emits explicit diagnostics URLs - can construct explicit diagnostics URLs from cfg `www` plus `mache` machine metadata @@ -143,7 +153,7 @@ Add `diagnostics_link_scanner.py`. Responsibilities: -- scan configured NERSC diagnostics roots for `provenance*.cfg` +- scan configured NERSC production diagnostics roots for `provenance*.cfg` - dedup with state file - verify diagnostics completion from index page plus status files - parse `case_name`, `machine`, `hpc_username` From 032ae1fae4781bced418e5e515e3a384556c4113 Mon Sep 17 00:00:00 2001 From: Tom Vo Date: Thu, 14 May 2026 11:46:15 -0700 Subject: [PATCH 5/5] Update plan.md to remove PACE integration changes Removed PACE integration changes from the plan. --- backend/docs/174-zppy-links/plan.md | 1 - 1 file changed, 1 deletion(-) diff --git a/backend/docs/174-zppy-links/plan.md b/backend/docs/174-zppy-links/plan.md index 2072e289..d17d10d5 100644 --- a/backend/docs/174-zppy-links/plan.md +++ b/backend/docs/174-zppy-links/plan.md @@ -23,7 +23,6 @@ MVP is NERSC-only. - Frontend redesign - Changes to manual external-link workflows -- PACE integration changes - Case identity or uniqueness refactor - Diagnostics content ingestion or indexing - Public HTML directory scraping