auto-qa: add forked-replay harness (57 invariants, 197 smoke tests, fixture-only CI) by krandder · Pull Request #10 · futarchy-fi/futarchy-api

krandder · 2026-05-10T22:54:38Z

Summary

Adds the auto-qa forked-replay harness as a self-contained subdirectory. Built incrementally over many iterations; all changes confined to auto-qa/ except for an additive block of auto-qa:* scripts in root package.json.

57 invariants in the orchestrator catalog (run npm run scenarios:by-layer for a layer-grouped view)
197/197 smoke tests passing against an in-process node:http fixture (Node 22 native test runner)
One staged CI workflow awaiting promotion: auto-qa/harness/ci/auto-qa-harness-smoke.yml.staged (workflow_dispatch only for v1; ~1 min runtime; no docker, no real services)

What's NOT in this PR

No production code changes. Only package.json is touched outside auto-qa/, and the diff is a pure addition of auto-qa:* script aliases (zero modifications to existing scripts or deps).
No real-services validation. The 197 tests run against the fixture, not anvil/indexer/api. The 4a-verify / 4b-verify / 4c-verify slices in auto-qa/harness/CHECKLIST.md exist exactly to close that gap; they need a Docker daemon (deferred to post-merge maintainer work).
CI not active yet. The smoke workflow file ships as .staged because GitHub blocks OAuth Apps without workflow scope from writing .github/workflows/. After merge, copy the .staged file into .github/workflows/ (instructions in auto-qa/harness/ci/README.md).

Layer breakdown (per `scenarios:by-layer`)

Layer	Count
api	10
api↔candles	5
api↔registry	3
orchestrator↔candles	21
orchestrator↔registry	8
orchestrator↔chain	10
Total	57

Coverage patterns shipped

Chain CAPABILITY trio — impersonate + snapshot + time-warp probes (the minimal scenario primitive set; without these, scenarios silently fail)
Iterate-all-rows triad — swap amounts + candle volumes + candle OHLC (latest-only + all-rows pairs catch uniform-aggregator vs subset-corruption bugs distinctly)
Body-shape probes — /health + /warmer (catches LB string-match breakage + ops-dashboard format regressions invisible to status-code-only checks)
Schema introspection MATRIX — DIRECT × API × {candles, registry} = 4 probes (any introspection failure pinpoints layer × indexer via a four-probe truth table)
Header probes — X-Cache, X-Response-Time, X-Cache-TTL on /api/v2/.../chart

Test plan

CI: smoke tests pass (will need to run via `workflow_dispatch` after promoting the staged file)
Local: `cd auto-qa/harness && npm ci && npm run smoke:scenarios` → 197/197 pass
Local: `cd auto-qa/harness && npm run scenarios:by-layer` shows the 57-invariant layer table
Local: `cd auto-qa/harness && HARNESS_COMPOSE=1 HARNESS_DRY_RUN=1 node orchestrator/scenario-runner.mjs` lists every invariant with its layer + description
(optional) Daemon-required: `docker compose up -d` from `auto-qa/harness/` brings the full stack — the `4e` acceptance gate; out of scope for this PR

Post-merge tasks (maintainer)

Promote `auto-qa/harness/ci/auto-qa-harness-smoke.yml.staged` into `.github/workflows/auto-qa-harness-smoke.yml`
Smoke-test it via the Actions UI (`workflow_dispatch`)
After it's green, add a `pull_request: paths: ['auto-qa/harness/**']` trigger so harness-touching PRs gate on it
Re-evaluate the daemon-required `*-verify` slices once Docker is available

🤖 Generated with Claude Code

Initial setup for the /loop auto-QA initiative on this repo. Adds auto-qa/PROGRESS.md as the working ledger: - Methodology + status snapshot - Full PR ledger #1–#9 (entire repo history): 7 bug-fix, 1 feature, 1 infra. Each bug-fix has hypothesis + ideal test + tools needed. - Tooling backlog ranked by leverage: 1. GraphQL passthrough contract test (4/9) 2. Unified-chart snapshot test (2/9) 3. Path-prefix dual-form test (1/9 — trivial first build) 4. Field-semantics property walker (#9 + future) - Open question recorded: pick node --test vs vitest. Default to node --test for minimal surface. No production code touched.

First test landed: auto-qa/tests/path-prefix.test.mjs covers PR #1 (the /charts path-prefix-strip middleware). Two cases, both green against the live api.futarchy.fi: ✔ GET /charts/<path> ≡ GET /<path> (same JSON envelope) ✔ GET /charts/health ≡ GET /health (status 200 from both) Test runner: node --test (built-in, zero deps). Glob 'auto-qa/tests/**/*.test.mjs'. Run with: npm run auto-qa:test The fixture proposal is GIP-150 v2 (0x1a0f209f…) over a pinned historical time window for reproducibility. Tests skip cleanly when api.futarchy.fi is unreachable so the suite stays non-flaky. PROGRESS.md updated: status snapshot bumped (1 test landed-passing, runner resolved), PR #1 row marked landed-passing, open question on test framework removed. No production code touched.

Adds auto-qa/tests/passthrough-contract.test.mjs — 7 cases covering the GraphQL passthrough behavior fixed in 4 of this repo's 9 PRs: PR #4: scalar pool: "0x…" filter on candles (chain prefix translated) PR #7: scalar proposal: "0x…" filter (chain prefix translated) PR #8: pool_in / proposal_in / id_in array filters PR #9: periodStartUnix preserved as snapped boundary (asserts ts % 3600 === 0 for every returned candle) Cross: response IDs always come back without chain prefix All 9 auto-qa tests now green (2 path-prefix + 7 passthrough-contract). Strategy used: each test makes a real HTTP call to api.futarchy.fi with a stable fixture (GIP-150 v2 = 0x1a0f209f…), checks both that the call succeeds and that the documented invariant holds. This is the highest-leverage suite in the auto-qa backlog (4/9 bugs covered by a single file). Tests skip cleanly if the live API is unreachable, so the auto-qa suite stays non-flaky. PROGRESS.md updated: status snapshot now 9 tests landed-passing, PRs #4/#7/#8/#9 marked landed-passing in the ledger. No production code touched.

…ignment) Adds auto-qa/tests/unified-chart.test.mjs — 4 cases against GET /api/v2/proposals/:id/chart for the GIP-150 v2 fixture: PR #5 — both conditional_yes.price_usd and conditional_no.price_usd are positive numbers (covers the CONDITIONAL > EXPECTED_VALUE > PREDICTION fallback chain). Also asserts pool_id is a plain address (not chain-prefixed). PR #6 — company_tokens.base.tokenSymbol is "GNO" and explicitly NOT "PNK" (the legacy hardcoded fallback). currency.tokenSymbol or stableSymbol is set. Bonus — candles.yes / candles.no are non-empty arrays for the pinned window, every periodStartUnix is snapped to a 3600s boundary, every close parses to a positive number. Bonus — volume reported in human units (< 1e15), not raw wei. If unit normalization regresses, this fires. All 13 auto-qa tests now green: - 2 path-prefix.test.mjs (PR #1) - 7 passthrough-contract.test.mjs (PRs #4/#7/#8/#9) - 4 unified-chart.test.mjs (PRs #5/#6) Coverage: 7 of 9 PRs in the repo's history, all of which are bug-fixes. Remaining: PR #2 (rpc-proxy infra, would need mock upstream) and PR #3 (passthrough scaffold feature, implicitly covered by all the filter tests). No production code touched.

Adds auto-qa/tests/multi-proposal-smoke.test.mjs — iterates over 3 diverse proposal fixtures and asserts the chart endpoint returns a valid contract envelope for each: - GIP-150 v2 (GNO/sDAI, fully indexed) - TSLA Mega Package (TSLAon/USDS) - Circle native USDC on Gnosis (USDC/sDAI) Asserts shape only (HTTP 200 + envelope keys present + symbols are non-empty strings + never the legacy "PNK" fallback). Data quality (price > 0, candles non-empty) is the unified-chart.test.mjs job for the canonical fixture. All 16 api auto-qa tests now green: - 2 path-prefix - 7 passthrough-contract - 4 unified-chart - 3 multi-proposal-smoke ← NEW Surfaced (and documented in PROGRESS.md, per /loop directive — NOT fixed): TSLA Mega Package and CIP-82 both return zero prices and fall through to the "TOKEN" default base symbol. Likely missing CONDITIONAL pools — worth investigating in a real fix-pass.

Adds auto-qa/tests/spot-candles.test.mjs — 3 cases pinning the contract of GET /api/v1/spot-candles, the third major endpoint on this service that wasn't previously covered: 1. Missing `ticker` query param → 400 with `error` field. 2. Unknown ticker → 200 with `{ spotCandles: [] }` envelope. 3. When any candidate ticker returns data, every candle satisfies the documented `{ periodStartUnix: <unix-ts>, close: <number> }` shape. Test #3 is intentionally tolerant: it tries a few candidate tickers and skips (not fails) if none return data. The underlying source (futarchy-spot or GeckoTerminal) varies and pinning a specific ticker fixture would rot. API smoke-coverage now spans all three surfaces: /api/v2/proposals/:id/chart ← unified-chart.test.mjs + multi-proposal-smoke /candles/graphql ← passthrough-contract.test.mjs (PRs #4/#7/#8/#9) /api/v1/spot-candles ← NEW Test counts: 19 (18 passing, 1 skipped — the shape check, by design). No production code touched.

Adds auto-qa/tests/indexer-freshness.test.mjs — 3 cases that compare each Checkpoint indexer's head block to the live Gnosis chain tip and assert the lag stays under a threshold: candles (gnosis): < 5000 blocks (~7h @ 5s/block) registry: < 15000 blocks (~21h — registry runs further behind in normal ops) Plus a sanity case that both heads are positive numbers > 30M (real Gnosis blocks). Snapshot at iteration creation: - candles gnosis: 870 blocks behind (healthy) - registry: 7916 blocks behind (degraded but within threshold) This is a real ops invariant — earlier in this session we caught the registry indexer drifting to ~5000 blocks behind via the status page. Now there's a test for that. Skips gracefully if API or Gnosis RPC unreachable. All 22 api auto-qa tests now pass (21 pass + 1 skipped by design).

Adds auto-qa/tests/registry-org-shape.test.mjs — 4 cases that validate the Organization entity shape returned by the registry indexer: ✔ at least one org returned (catastrophic-empty guard — catches full table wipe / resync failure that would empty the Companies page upstream of the frontend) ✔ every org has {id, name, owner, metadata} ✔ every non-null metadata is parseable JSON ✔ at least one org metadata uses archived/visibility flags (PR #61 filter coverage diagnostic — emits counts so we know the filter is being exercised by real data) Cross-cutting catch for the bug family that landed as `interface` PR #61 (Companies page rendering empty after Checkpoint migration). If the registry indexer ever returns zero orgs again or the metadata field changes shape, this test fires before users see the symptom. Diagnostic at iteration time: 3/7 orgs use archived flag — the filter IS exercised by real data. All 26 api auto-qa tests now pass (25 pass + 1 skipped by design).

Adds auto-qa/tests/legacy-v1-prices.test.mjs — 6 cases covering an API surface that wasn't tested before: GET /api/v1/market-events/proposals/:id/prices (the predecessor to /api/v2/.../chart, still used by some clients) Cases: ✔ HTTP 200 + documented envelope keys ✔ conditional_yes/no expose price_usd + pool_id (positive number, plain address) ✔ company_tokens.base.tokenSymbol resolves (not legacy "PNK" fallback) ✔ v1 vs v2 cross-check: same base symbol + same YES pool_id (catches the v1 and v2 paths drifting apart) ✔ response time < 5s for v1 (perf bound — catches warmer regression) ✔ response time < 5s for v2 (perf bound) The cross-check is the most leveraged test — both endpoints serve the same underlying market, so any logic-divergence between the v1 and v2 codepaths trips this test before users see inconsistencies. All 32 api auto-qa tests now pass (31 pass + 1 skipped by design).

Adds auto-qa/tests/operational-endpoints.test.mjs — 3 cases pinning the contract of the two operational endpoints not previously covered: ✔ /health returns 200 with {status, timestamp} and timestamp is fresh (catches edge-cache pinning that would freeze liveness checks) ✔ /warmer returns {active, entries[]} with consistent counts ✔ /health timestamp advances between consecutive calls (1.5s apart) (catches edge cache regression on the health endpoint) These are the surfaces status.futarchy.fi and any uptime monitor depend on — if /health goes stale or /warmer crashes silently, this test surfaces it before users notice a stale dashboard. All 35 api auto-qa tests now pass (34 + 1 skipped by design).

Pins the GraphQL passthrough surface itself, independent of any user-defined schema. Catches a layer the existing schema-shape tests would miss with a more confusing error message: - Cloud Run revision shipped without the route mounted - Upstream Checkpoint indexer entirely unreachable - HTTPS termination broken - Reverse-proxy stripping the request body - Error envelope shape changes that break clients branching on `response.errors[0].message` Coverage: - { __typename } returns "Query" on both endpoints - { __schema { queryType { name } } } returns a non-empty type name - Malformed query yields { errors: [{ message: <string> }] } body - GET is rejected (POST-only surface, must NOT return 200) - 12 parallel introspections complete cleanly (no shared mutex) Surfaced inconsistency (pinned, NOT fixed per directive): /candles/graphql returns HTTP 502 on parse errors /registry/graphql returns HTTP 400 on the same input 502 misclassifies a client error as server failure. Pinned in PARSE_ERROR_STATUS so a deliberate unification surfaces as a test update. PR coverage: 7/9 -> 8/9 (only #2 RPC infra remains). Tests: 35 -> 46 (api), 103 -> 114 cross-repo. All green.

Pins the cross-origin contract every browser-side caller depends on: the frontend at futarchy.fi, staging frontends, Apollo Client, and the Snapshot widget at snapshot.box. Not tied to any single PR — defensive against a class of regressions: - cors() middleware accidentally dropped from a route - Stricter origin allowlist excludes futarchy.fi - Apollo-Require-Preflight no longer in allow-headers - X-Cache / X-Response-Time stops being exposed (silently zeros the frontend's cache-hit instrumentation) Coverage: - 5 endpoints × 3 representative origins (futarchy.fi, staging, snapshot.box) preflight matrix - POST + REST GET responses also carry CORS headers (not just preflight) - Allow-Headers includes Content-Type AND Apollo-Require-Preflight - Allow-Methods includes POST for both passthroughs - Expose-Headers includes X-Cache and X-Response-Time - Pinned-policy ratchet: today's policy is wildcard origin, test fires loudly if we tighten so REPRESENTATIVE_ORIGINS gets updated Tests: 46 -> 67 (api), 125 -> 146 cross-repo. All green.

Pins the X-Cache observability instrumentation on the chart endpoint (/api/v2/proposals/:id/chart) — the only hot read path with a cache. Catches a class of regressions that otherwise only surface as latency slowly rising in production: - Cache layer silently disabled (X-Cache header missing → frontend cache-hit dashboard goes blind, no obvious user impact until p99 latency rises) - X-Cache returns garbage instead of literal HIT or MISS - X-Cache-TTL drifts to 0 (every call cold) or unbounded (stale data) - X-Response-Time format breaks (frontend dashboard math goes NaN) - Cache key includes a non-deterministic component (back-to-back requests both MISS, throughput collapses) - HIT path silently degraded (HIT requests no longer < 100ms) Today's measurements: TTL=13s, HIT=0ms, MISS≈30-160ms. Test loosens those bounds to defensible ceilings (TTL <= 24h, HIT <= 100ms). Tests: 67 → 73 (api), 152 → 158 cross-repo. All green.

Pins the boundary semantics of GET /api/v2/proposals/:id/chart around its minTimestamp / maxTimestamp params. Catches a class of quiet bugs where the endpoint returns data outside the requested window — the chart silently shows wrong candles with no visible symptom unless someone manually inspects timestamps. Coverage: Degenerate-window graceful handling (must NOT 5xx): - inverted window (max < min) → 200 + empty candles - far-future window → 200 + empty candles - far-past window → 200 + empty candles - missing both timestamps → 200 + default window applied - negative timestamps → 200 (defensive) Window-respect invariants on a known-good window: - every returned candle satisfies min <= periodStartUnix <= max - candles strictly ascending by periodStartUnix in each series - shape contract: {periodStartUnix, close} both parse as numbers - 1-second window between known candles returns at most 1 per series (period-snapping invariant from api PR #9) Defensive against: - Window predicate flipped (>= ↔ <=) in the indexer query - Sort order inverted in a refactor - Default-window logic returning unbounded data on missing params - Inverted/future/past windows crashing the Checkpoint passthrough Tests: 73 → 82 (api), 174 → 183 cross-repo. All green.

Pins POST /subgraphs/name/algebra-proposal-candles-v1 — a backward- compat shim that proxies to the same upstream as /candles/graphql but also injects spotCandles: [] into the response. Older clients (snapshot-labs/sx-monorepo, pre-Cloud-Run integrations) still hit this URL pattern; removing it would silently 404 them — same class of bug as PR #1 (/charts prefix lost). Coverage: - POST { __typename } returns 200 + Query - GET is rejected (POST-only surface) - spotCandles injection invariant on the legacy route - Negative confirmation: modern /candles/graphql does NOT inject spotCandles (the two routes have distinct contracts; if both start or both stop injecting, they've drifted) - Cross-route parity: real candles(...) query returns same shape + same row count from both routes - Malformed query yields the standard errors[] envelope API surface coverage: 3/4 → 4/4. Tests: 82 → 88 (api), 188 → 194 cross-repo. All green.

Pins the type contract of /api/v2/proposals/:id/chart's market block. The frontend branches on heterogeneous types — price_usd as number (JSON-native) vs volume as string (preserves 18-decimal precision from the indexer) — and any "normalization" refactor that homogenizes either type breaks parsing. Coverage: Type heterogeneity (the subtle invariant): - market.{conditional_yes,conditional_no}.price_usd → number, finite, > 0 - market.volume.{cy,cn}.volume / volume_usd → string, parses positive - volume.{cy,cn}.status === "ok" for healthy fixture Address shape: - event_id == requested proposal id (exact lowercase match) - trading_address == event_id (single-trade-address invariant) - all pool_ids match /^0x[a-f0-9]{40}$/ (chain prefix stripped) - YES pool_id ≠ NO pool_id (catches pool-resolution collapse) Timeline + chain: - timeline.start, end are integer unix ts in 2020-2050 range - timeline.start <= timeline.end - timeline.chain_id === 100 (Gnosis pin) Tokens (sharper than existing PR #6 test): - company_tokens.base.tokenSymbol non-empty AND not "TOKEN" fallback (catches pool-resolution priority chain regression on the canonical healthy fixture) Tests: 88 → 99 (api), 213 → 224 cross-repo. All green.

Pins how /api/v2/proposals/:id/chart treats four input classes: canonical lowercase, uppercase (clients sometimes send checksummed), zero address, and garbage/path-traversal/oversized strings. Today's behavior is permissive (every input → 200, with empty/fallback data for non-existent proposals). The test pins that permissiveness so a future input-validation patch surfaces as a deliberate API change requiring client coordination. Coverage: Case-insensitive lookup (the most important invariant): - uppercase request returns same data as lowercase (same pool_id, same token symbol — catches case-sensitive lookup leak) - event_id normalized to lowercase in response Zero-address graceful degradation: - 200 status, prices=0, "TOKEN" fallback symbol Garbage-input safety (must NOT 5xx): - non-hex string ("0xnotahexvalue") → 2xx - too-short string ("shortaddr") → 2xx - path-traversal payload ("../etc/passwd") → 2xx + JSON body (defensive: must NOT pass through to upstream as a substring) - very long id (502 chars) → < 500 status Tests: 99 → 107 (api), 244 → 252 cross-repo. All green.

Pins the three pure helpers in src/adapters/candles-adapter.js that underpin the entire Checkpoint passthrough translation: stripChainPrefix(id) "100-0xabc" → "0xabc" addChainPrefix(id, chainId=100) "0xabc" → "100-0xabc" stripPrefixesAndNormalize(value) recursive walker over response objects CHAIN_PREFIXED_RE /^\d+-0x[a-fA-F0-9]{40}$/ Every PR #4/#7/#8/#9 fix relied on these being correct. A regression in any one of them returns wrong data for every passthrough query (or worse: 200 with no data, which looks normal until users complain). Coverage: stripChainPrefix: - "100-<addr>" → bare addr; works for chains 1, 137, etc - bare addr → unchanged (idempotent) - null/undefined/"" passed through - composite IDs ("1-<addr>-3600-<ts>") strip only leading segment addChainPrefix: - bare addr → "100-<addr>" (default chain) - custom chainId - already-prefixed input NOT double-prefixed (critical idempotency) - null/undefined/"" passed through - round-trip: stripChainPrefix(addChainPrefix(addr)) === addr CHAIN_PREFIXED_RE pattern: - matches valid forms (mixed-case hex) - rejects bare addrs, composite IDs, partial matches, non-hex, wrong-length, leading/trailing extras stripPrefixesAndNormalize walker: - top-level object fields - nested objects (recursion) - arrays (preserves order + length) - leaves non-matching strings intact (composite IDs, URLs, numeric strings) - handles primitives + nullish at any depth - idempotent (run twice = same output) Tests: 107 → 128 (api), 262 → 283 cross-repo. All green.

Pins src/utils/token-from-pool.js — the pool-name → company/currency symbol resolver that PR #6 fixed. The PR-#6 unified-chart test pins the end-to-end "no PNK leak" property; this test pins the function directly so a regression to the priority chain or pattern matching surfaces with a clear message instead of a downstream "TOKEN" fallback that could be mistaken for an indexer issue. Coverage: Empty / invalid: - empty array, non-array, no-recognized-type all → both null Each pool type, happy path: - CONDITIONAL "YES_GNO / YES_sDAI" → company=GNO, currency=sDAI - NO_ prefix on either side accepted - EXPECTED_VALUE "YES_GNO / sDAI" → company=GNO, currency=sDAI - PREDICTION degenerate symmetry "YES_sDAI / sDAI" → company=null, currency=sDAI Priority chain (heart of PR #6's fix): - CONDITIONAL beats EXPECTED_VALUE even when EV is first in array - EXPECTED_VALUE beats PREDICTION when CONDITIONAL absent - falls all the way through to PREDICTION when no others present Defensive: - pools with no name field → skipped, fallback to next - null/undefined entries in array → skipped (no throw) - unrecognized name format → both null - whitespace tolerance around the slash separator - symbol \\w class permits digits + underscores Anti-PNK regression check (PR #6's whole point): - none of the empty/unknown/malformed paths return "PNK" Tests: 128 → 144 (api), 294 → 310 cross-repo. All green.

Pins src/utils/cache.js — the in-memory TTL cache backing the response/registry/candles/spot caches. Subtle behaviors that regressions can break silently: - TTL expiry: get() must return undefined AND delete the entry (not just lazy-skip) - Hit/miss counters increment exactly once per get(); expired- entry get counts as MISS - clear() resets both store AND counters (not just store) - set() resets the TTL clock for the key Coverage: get/set basics: - set then get returns value - get on missing key returns undefined - set overwrites Counter accuracy: - hit increments hits, miss increments misses - interleaved counts independent TTL expiry (with 30-50ms TTLs for fast tests): - entry expires after ttlMs - expired-entry get is a MISS - expired-entry get DELETES from store - set() resets the TTL clock stats() formatting: - 0% with no calls - integer percent rounding - entry count from store.size clear(): - empties store + resets both counters cache-config defaults pinned via source-file regex: - RESPONSE_TTL_SEC default = 13s - REGISTRY_TTL_SEC default = 300s - WARMER_INTERVAL_SEC = max(RESPONSE_TTL - 3, 5) formula Tests: 144 → 161 (api), 357 → 374 cross-repo. All green.

Pins src/config/endpoints.js — the single switch routing the api between Graph Node (legacy AWS, dead) and Checkpoint (post-AWS-GCP target). A regression that flips the default mode back to graph_node, removes the BROKEN_ footgun prefix, or drifts localhost ports breaks adapter calls silently. Coverage: MODE handling: - default is "checkpoint" (post-AWS-migration target) - lowercased so case-insensitive env vars work - allowlist is exactly graph_node + checkpoint - unknown MODE warns AND falls through to GRAPH_NODE (the pre-existing "if not exactly checkpoint then GRAPH_NODE" logic — pinned so any "fix" is deliberate) GRAPH_NODE footgun deterrent: - registry + candles URLs both prefixed with BROKEN_GRAPH_NODE_DO_NOT_USE:// (post-AWS-migration intentional breakage so accidental routing fails fast with DNS error) - legacy AWS CloudFront host pinned in the URL CHECKPOINT defaults: - registry localhost port = 3003 (Registry checkpoint per comment) - candles localhost port is 3001 (prod) or 3004 (staging) per comment - both read process.env.{REGISTRY_URL,CANDLES_URL} with fallback Exports: - ENDPOINTS, IS_CHECKPOINT, MODE all exported - IS_CHECKPOINT defined as MODE === "checkpoint" Tests: 161 → 172 (api), 385 → 396 cross-repo. All green.

Pins the LRU eviction + re-registration logic in src/utils/warmer.js's registerForWarming function. The /warmer endpoint shape is covered by operational-endpoints.test.mjs but the eviction policy itself is not tested. A refactor that silently changes the eviction order or breaks re-registration would cause: - Re-registration treated as new entry → list bloat + churn - LRU broken → unbounded growth past WARMER_MAX_ENTRIES - registeredAt updated on re-register → retention windows reset Coverage: Initial registration: - first call adds entry with params, lastSeen, registeredAt - lastSeen === registeredAt on first registration Re-registration of existing key: - updates lastSeen but NOT size - registeredAt is preserved (used for RETENTION_DAYS check) - params are NOT updated (intentional: stale call shouldn't overwrite params from initial registration) LRU eviction at maxEntries: - next registration after capacity evicts oldest by lastSeen - re-registering an old entry protects it from eviction - works at degenerate maxEntries=1 - 20 rapid registrations to a max-5 warmer → exactly 5 entries Config defaults pinned via cache-config.js source regex: - WARMER_MAX_ENTRIES = 50 - WARMER_RETENTION_DAYS = 7 - ENABLE_WARMER defaults to "true" (enabled) Tests: 172 → 183 (api), 412 → 423 cross-repo. All green.

Pins src/services/rate-provider.js — the ERC-4626 rate fetcher used to convert sDAI rates into base prices throughout the api. Critical constants where a typo silently returns wrong data: GET_RATE_SELECTOR keccak256("getRate()")[:4] = 0x679aefce Drift here → eth_call falls into the catch → returns 1 (no-conversion fallback) silently CHAIN_CONFIG[100].defaultRateProvider The canonical sDAI rate provider on Gnosis (0x89C8...EceD). Typo → every sDAI conversion uses 1.0 instead of the real rate 18-decimal scaling Number(rateBigInt) / 1e18. Wrong divisor (1e6 for USDC) scales every rate by 1e12; TVL dashboards explode Coverage: GET_RATE_SELECTOR pinned exact value + still referenced in eth_call payload CHAIN_CONFIG[1] is Ethereum + null defaultRateProvider (pinned; adding an Ethereum default surfaces as deliberate change) CHAIN_CONFIG[100] is Gnosis + canonical sDAI address CACHE_DURATION = 5 * 60 * 1000 (5 min — sweet spot vs RPC load) All four error paths return 1 (the no-conversion fallback): - unknown chain - missing providerAddress - RPC error in result - thrown exception in catch 18-decimal scaling literal pinned Tests: 183 → 195 (api), 434 → 446 cross-repo. All green.

Pins src/services/spot-source.js — the toggle that routes spot price fetches between the futarchy-spot service and CoinGecko/GeckoTerminal. Coverage: Toggle: - USE_FUTARCHY_SPOT default is empty-string (== falsy → use Gecko) - .toLowerCase() applied so "TRUE"/"True" both work - FUTARCHY_SPOT_URL default is http://localhost:3032 URL construction (futarchy-spot endpoint shape): - calls /api/v1/candles?ticker=...&minTimestamp=...&maxTimestamp=... - encodeURIComponent on ticker (ticker contains "+", "!", "/") Reliability: - 10s AbortSignal timeout - non-OK response falls back to fetchFromGecko - try/catch wraps the entire fetch with same fallback Default-window math: - minTimestamp = maxTs - (limit * 3600) [hours back, NOT days/min] - Math.max(0, minTimestamp) clamps to no-negative-unix - default limit = 500 Surfaced bug (NOT fixed per directive — pinned for ratchet): src/services/spot-price.js has a hardcoded CoinGecko API key as `process.env.COINGECKO_API_KEY || '<KEY>'` fallback. Leaked key in source. Test pins existence (not value) so a removal surfaces as deliberate fix and any new addition surfaces too. Plus pinned DEFAULT_CONFIG ticker = 'PNK/WETH+!sDAI/WETH-hour-500-xdai' Tests: 195 → 208 (api), 458 → 471 cross-repo. All green.

Pins src/services/spot-price.js's parseConfig — the parser that decodes ticker config strings used throughout the spot-price chain. Four formats supported: 1. composite::POOL1+POOL2::RATE-interval-limit-network 2. BASE/QUOTE+!OTHER/QUOTE-... (multi-hop, ! inverts) 3. 0xPOOL[::RATE]-interval-limit-network (direct address) 4. BASE[::RATE]/QUOTE-interval-limit-network (base/quote ticker) Plus trailing -invert flag and URL auto-decoding. Bug class this catches: a refactor that breaks the format disambiguation order silently routes "PNK/WETH" through the wrong branch, returning bad data with no obvious symptom. Coverage: Falsy input → null Format 1 (composite): - two pools + rate provider - ! invert prefix on a hop - missing rate provider Format 2 (multi-hop): - "A/B+C/D" parses two hops with base/quote split - "!" prefix inverts hop, NOT included in base symbol Format 3 (pool address): - bare 0x pool, no rate - "0xPOOL::0xRATE" extracts both - case-insensitive 0x prefix detection (0X works) Format 4 (base/quote): - simple "GNO/sDAI" - "BASE::RATE/QUOTE" extracts rate from base side -invert flag: - case-insensitive (INVERT, Invert, invert) - stripped from parts before indexing - default invert=false Defaults: - interval="hour", limit=500, network="xdai" - partial parts use defaults for missing slots URL decoding: - auto-decodes when % present - skips decode when no % (perf shortcut pinned) Format disambiguation order: - composite:: takes priority over + and 0x checks - multi-hop + takes priority over pool-address branch Tests: 208 → 228 (api), 492 → 512 cross-repo. All green.

Pins src/adapters/registry-adapter.js — the on-chain registry lookup that powers resolveProposalId (normalizes arbitrary proposal IDs to canonical addresses) and lookupOrgMetadata. Coverage: normalizeProposalResult (pure shape-normalizer): - proposalId + proposalAddress are lowercased - originalProposalId preserves the input case (display in checksummed form back to user) - empty/null proposal yields all-undefined-or-null shape (no throw) - organization fields extracted (id + name) - 6 parseInt config fields (closeTimestamp, startCandleUnix, twapStartTimestamp, twapDurationHours, chain, pricePrecision): - valid string → integer - missing → null (NOT 0, NOT NaN) - empty string → null (the truthy-check; without it parseInt("") yields NaN throughout downstream) - 4 string fields with || null fallback (coingeckoTicker etc.) Pinned canonical addresses (all four constants): - AGGREGATOR_ADDRESS = 0xc5eb43...4fc1 (case-insensitive match with futarchy-fi/interface DEFAULT_AGGREGATOR — cross-pinned) - SNAPSHOT_LINK_REGISTRY = 0xa6Bc28...0823 - FACTORY_ADDRESS = 0xa6cB18...0a345 - GNOSIS_RPC default = https://rpc.gnosischain.com Tests: 228 → 248 (api), 530 → 550 cross-repo. All green.

Pins src/services/spot-price.js's combineHopCandles + NETWORK_MAP + GECKO endpoint selection logic. combineHopCandles is the core multi-hop price multiplier — given candles for each hop in a multi-hop ticker (e.g. PNK/WETH × WETH/sDAI), produces a single composite series by collecting all unique timestamps, forward-filling missing prices per hop, and multiplying once ALL hops have at least one known price. A regression here silently corrupts every multi-hop spot price. Coverage: combineHopCandles: - empty array → empty - single-hop → identity (returns SAME array, not copy) - two hops same timestamps multiply per-timestamp - three hops multiply all together - missing timestamp on one hop forward-fills from previous - skips timestamps before ALL hops are initialized (warmup) - output sorted by time ascending - float precision preserved through multiplication NETWORK_MAP: - xdai alias (chainId 100, gecko "xdai") - gnosis alias (synonym for xdai — both must route to chain 100) - eth alias (chainId 1) - base alias (chainId 8453) - all RPC URLs are HTTPS GECKO endpoint selection (key-conditional URL + headers): - GECKO_API switches to pro-api.coingecko.com when key set - Falls back to api.geckoterminal.com (public) - GECKO_HEADERS adds 'x-cg-pro-api-key' when key set (pro-api requires it) - Public-headers branch is just {accept} — defensive against leaking pro key to public terminal endpoint Tests: 248 → 265 (api), 568 → 585 cross-repo. All green.

iteration 29 (api side). New: auto-qa/tests/algebra-client.test.mjs (17 cases). Pins src/services/algebra-client.js — the LEGACY Graph Node-shaped client still imported by unified-chart.js + market-events.js as the non-Checkpoint fallback path. Five concerns: 1. ALGEBRA_ENDPOINT === ENDPOINTS.candles (env-driven, not hardcoded URL). Plus a defensive scan asserting NO http(s):// strings live outside comments. 2. fetchPoolsForProposal uses GraphQL VARIABLE binding ($proposalId: String!) NOT inline interpolation — protects against query injection. Variable type is String! (not BigInt!) — Graph Node shape, not Checkpoint. 3. getLatestPrice period hardcoded to "3600" (1-hour candles) on BOTH ternary branches. Drift would silently change chart sampling rate. 4. getLatestPrice maxTimestamp param defaults to null (not undefined, not 0); when null, _lte filter is omitted; when set, included. A 0 default would query "everything <= 0" → zero rows. 5. Default-zero behavior: returns 0 (not null/NaN) when no candle found — pinned because callers expect numeric. parseFloat (not parseInt) on candle.close. Plus pins for the orderBy/orderDirection invariant (must be periodStartUnix DESC + first 1 to get LATEST, not earliest), the Graph-Node-only nested selections (token0/token1/proposal), the GraphQL error-throw guards on both functions, and the module docstring's pointer to candles-adapter.js for mode-aware code. Tests: 265 -> 282 (api). All 17 new cases passing.

iteration 30 (api side). New: auto-qa/tests/graphql-passthrough-factory.test.mjs (20 cases). First UNIT-level coverage for src/routes/graphql-passthrough.js — the generic GraphQL passthrough factory wired into /registry/graphql and /candles/graphql by src/index.js. Existing passthrough-smoke and passthrough-contract tests exercise the live HTTP endpoint; this file locks the factory's internal branching with mock req/res + a fetch stub (no live network). Branches pinned: - Factory shape — returns an async (req, res) handler. - 503 branch — getUpstreamUrl() returning null/undefined/"" emits `{ errors: [{ message: '[label] upstream URL not configured' }] }` AND short-circuits BEFORE calling fetch (prevents accidental fetch("") on env-driven misconfig). - Happy path — POST + Content-Type: application/json + AbortSignal, upstream status code forwarded (probed at 200/201/400/502), upstream content-type forwarded with 'application/json' fallback, body forwarded VERBATIM (no JSON.parse re-stringify). - req.body fallback — undefined → "{}", null → "{}" (the ?? operator in `req.body ?? {}`). Catches a regression where the body becomes the literal string "undefined". - Error branches — AbortError → 504 ("[label] upstream timeout after Nms"); other Error → 502 ("[label] upstream error: <msg>"); error without .message → "unknown" (defensive default). Plus source-text pins: DEFAULT_TIMEOUT_MS = 15_000 (15s), AbortController + signal wiring, clearTimeout in `finally` (timer leak guard under high traffic), and the [${label}] log prefix invariant for ops triage. Tests: 282 -> 302 (api). All 20 new cases passing.

…BoundedByDirect First true cross-layer count check for the unified-chart endpoint. apiUnifiedChartShape only validated SHAPE; this asserts the inter- layer relationship between api yes+no candles and direct indexer total. Since api filters by proposal pools, api ⊆ direct, so api total ≤ direct total. Catches api filter regression (returns ALL instead of pool-filtered subset) and transform fabrication. Cross-layer match family now spans 3 patterns: passthrough match (apiCandlesMatchesDirect), multi-entity passthrough match (apiRegistryMatchesDirect), and filtered subset (NEW). Test fix: previously-passing apiUnifiedChartShape populated 3 candles but default direct had 1; bumped its candlesCandlesCount to 3 to keep it happy under the new invariant. 36 invariants total. 109/109 smoke tests pass (was 105). Bridges to the documented full chartShape invariant — that future iteration extends count to ID-by-ID pair-wise compare.

…dAbove Magnitude-upper-bound for swap amounts. Closes the swap-side gap in the magnitude-sanity family: candle side already had probabilityBounds + candlePricesNonNegative; swap side only had > 0 + range checks. Asserts amountIn AND amountOut < 1e15 — catches raw uint256 leaks (parseFloat returning 1e18 instead of decimal "1.0") and token-decimal misalignment that scales values by 1e6x. Distinct from swapAmountsPositive which only checks sign; raw-int leak passes that check (1e18 > 0) but fails this one. 37 invariants total. 113/113 smoke tests pass (was 109). Magnitude- sanity family now SYMMETRIC across candle and swap sides — each has lower-bound + upper-bound coverage.

First indexer-side enum validation. For all pools (first 50), asserts type ∈ {CONDITIONAL, PREDICTION, EXPECTED_VALUE} (the set sourced from unified-chart.js's findPoolByOutcome). Catches: - Schema migration that adds a 4th type without updating consumers - Indexer regression returning null type - Typo'd type values like "PRDICTION" Distinct from probabilityBounds which treats non-PREDICTION as vacuous — so a typo'd type silently slips through every existing check while the api adapter silently drops the pool. New pattern: iterate-all-rows enum check (vs latest-row or count-only). 38 invariants total. 118/118 smoke tests pass (was 113). Pool-entity coverage now spans existence + FK + per-pool field validation.

…hyProdAggregator High-value PINNING check at the registry layer. Asserts the indexer has the production futarchy aggregator (0xc5eb43d5…d4fc1, hardcoded in 3 api source files: registry-adapter.js, unified-chart.js, market-events.js — the api literally cannot function without this aggregator's data). Registry-side analog of anvilChainId: chain pin proves we forked Gnosis; this pin proves the indexer was bootstrapped with the right chain + start_block + contract config. Distinct from registryHasAggregators (existence): a wrong-block bootstrap might produce ghost aggregators, passing the existence check but missing the prod one entirely. Test 4 verifies this gap explicitly. Fixture: new includeFutarchyProdAggregator knob (default true) appends prod address. 2 existing tests updated to set knob=false where they assert exact aggregator counts. 39 invariants total. 121/121 smoke tests pass (was 118). Hardcoded- address pinning now symmetric across chain (anvilChainId) + registry (this slice).

…sObservabilityHeaders 40-invariant milestone. First response-HEADER validation in the catalog — every prior api invariant probed status code or body shape. This asserts X-Cache ∈ {HIT, MISS} AND X-Response-Time matches /^\d+ms$/. The unified-chart handler emits these on every code path (cached HIT + fresh MISS); ops dashboards consume them. A regression that drops them is invisible to body checks. Test 3 verifies the gap: drop X-Cache header → apiUnifiedChartShape STILL passes since body shape unchanged; only this header probe catches it. Catches: removal of cache layer instrumentation; addition of third state ('STALE') without telling ops; format regressions emitting 'NaN ms' or raw integer. Fixture: chart handler now emits headers unconditionally; new unifiedChartXCache / unifiedChartXResponseTime knobs. 40 invariants total. 126/126 smoke tests pass (was 121).

…nMentionsAnvil Chain-CLIENT identity pin. Distinct from anvilChainId (chain-NETWORK pin). Calls web3_clientVersion; asserts response contains "anvil". Together they pin both layers of "right environment" — chain ID for the network, client version for the EVM impl. Catches running against a Gnosis fork on geth/erigon where chain ID matches but anvil_/evm_ extensions for impersonation, snapshots, and time-warping would silently fail in scenario tests. 41 invariants total. 130/130 smoke tests pass.

…n api side First staged CI workflow on the api side. Job runs the orchestrator's 130+ smoke-test invariant battery against an in-process node:http fixture (no docker, no real services, ~1.5s test time + Node setup). Trigger is workflow_dispatch only for this first version, matching the conservative roll-out of slices 3a + 3c. Also added auto-qa/harness/ci/README.md mirroring interface-side pattern (staging dance explanation + currently-staged table + promotion command). Cheapest of the 4 currently-staged CI workflows to promote (no docker, no Playwright, no GH Actions secrets); recommended first promotion target for the maintainer.

…bsetOfDirect (42nd invariant) First cross-layer per-row TIME-PAIR check for the unified-chart endpoint. Strengthens chartCandleCountsBoundedByDirect (count bound) into per-row time-membership: every candle time the api surfaces must appear in the direct candles indexer's time set, otherwise the api is fabricating data (or mixing another proposal's periods). Uses `time` not `id` because applyRateToCandles reshapes raw indexer candles and doesn't expose IDs. Catches bug classes the count bound MISSES: transform synthesizing period-start timestamps, cache key mismatch returning wrong proposal's candles, time-bucket off-by-one, SPOT bleeding into yes/no. 42 invariants now: 11 api-internal + 26 indexer + 5 chain. 134/134 smoke tests pass (4 new + 2 existing tests aligned to DESCENDING candleTimes so candleTimeMonotonic stays happy).

…ent (43rd invariant) Sixth chain-layer invariant. Probes the FEE-MARKET state, which can be independently broken from chain identity / block shape. Asserts eth_gasPrice returns a 0x-prefixed positive hex value. Three named failure modes (each with its own diagnostic): - null → EIP-1559-only mode (legacy gas pricing disabled) - 0x0 → broken fee market (anvil --gas-price 0 misconfig) - non-hex → RPC-layer regression (BigInt parsing breaks) Why this matters for scenarios: most futarchy flows submit transactions (impersonateAccount + send) which need a working gas price for estimation. Without this probe, a scenario reports "transaction failed at step N" with no breadcrumb pointing to the fee-market issue. 43 invariants now: 11 api-internal + 26 indexer + 6 chain. 139/139 smoke tests pass (5 new). 1 fixture knob added (gasPriceHex), 1 RPC dispatch case added (eth_gasPrice).

…acheTtlPresent (44th invariant) Second response-HEADER probe in the catalog. Sister to apiUnifiedChartHasObservabilityHeaders (X-Cache + X-Response-Time); this one covers X-Cache-TTL. Split into a separate invariant for single-responsibility per probe — ops dashboards filter on TTL independently of hit/miss. Scope correction: the original X-Cache+X-Response-Time invariant's comment said TTL was HIT-only. Inspection of unified-chart.js shows it's set on BOTH paths (line 74 HIT + line 278 MISS), so this asserts unconditionally rather than as a conditional check. The old comment was updated in the same commit. Format asserted: positive integer string, no unit suffix. Catches refactor dropping TTL from one path but not the other (sister probe STILL passes — demonstrates per-header-split value), 'NaN' /'-1' from timing/env-var bugs, accidental unit suffix ('300s' silently wrong: parseInt returns 300 by coincidence), header dropped entirely. 44 invariants now: 12 api-internal + 26 indexer + 6 chain. 144/144 smoke tests pass (5 new). 1 fixture knob added (unifiedChartXCacheTtl).

…onMatchesChainId (45th invariant) Seventh chain-layer invariant. Chain-RPC-CONSISTENCY check — asserts net_version (decimal) and eth_chainId (hex) numerically agree. Both methods should report the same chain ID by spec (net_version is legacy; eth_chainId is the EIP-695 modern method). Divergence silently breaks consumers that pick one or the other. Orthogonal to anvilChainId: that asserts eth_chainId === 0x64 (the EXPECTED Gnosis value); this asserts net_version === eth_chainId (CONSISTENCY regardless of WHAT they equal). Demonstrated by the bare-anvil-31337 test: both methods report 31337, this passes (consistency intact), anvilChainId fails (wrong network). Bug shapes caught (NOT caught by anvilChainId alone): - Fork rebase updates one method but not the other - Reverse-proxy misconfig routes them to different upstreams - Mock fixture hardcodes one but not the other - Anvil version regression where one method reads from a stale cached config and the other from live state 45 invariants now: 12 api-internal + 26 indexer + 7 chain. 149/149 smoke tests pass (5 new). 1 fixture knob added (netVersion), 1 RPC dispatch case added (net_version).

…nCapabilityPresent (46th invariant) Eighth chain-layer invariant; first to exercise an ANVIL-SPECIFIC RPC method (anvil_impersonateAccount) rather than standard JSON-RPC. Asserts the method is actually callable, not just that the client *claims* to be anvil — distinct domain from anvilClientVersionMentionsAnvil. Several "hardhat-compatible" forks and patched-anvil builds exist that emit "anvil" in web3_clientVersion but lack the impersonation extension scenarios depend on. With anvilImpersonationSupported: false, the capability probe FAILS while the identity probe STILL passes — proving the two checks are orthogonal. Why this matters for scenarios: every futarchy flow that mutates state requires impersonating an account (proposer, trader, resolver). Without this method, EVERY scenario silently fails to produce state changes. 46 invariants now: 12 api-internal + 26 indexer + 8 chain. 152/152 smoke tests pass (4 new). 1 fixture knob added (anvilImpersonationSupported: true | false | 'rpc-error'), 1 RPC dispatch case added (anvil_impersonateAccount).

…bilityPresent (47th invariant) Ninth chain-layer invariant; second chain-CAPABILITY probe. Sister to anvilImpersonationCapabilityPresent. Together they form the MINIMAL CAPABILITY SET scenarios depend on: - impersonate → call function as arbitrary account - snapshot/revert → roll back state between tests Distinct domain from impersonation: evm_snapshot is part of the GANACHE LINEAGE (anvil + hardhat both support it; geth/erigon/reth don't). Failure modes are complementary: - anvil_* missing → wrong dev client (hardhat instead of anvil) - evm_* missing → real client (geth/erigon/reth) - both ok → minimal scenario capability satisfied Also catches subsystem-broken case: method registered but returns null/non-hex (calling evm_revert with that silently fails). The non-hex check distinguishes "registered but broken" from "not registered at all" — different diagnostic paths. 47 invariants now: 12 api-internal + 26 indexer + 9 chain. 156/156 smoke tests pass (4 new). 1 fixture knob added (snapshotResult: '0x1' | false | null | 'rpc-error'), 1 RPC dispatch case added (evm_snapshot).

…sPositive (48th invariant) First iterate-all-rows extension on the swap side. Strengthens swapAmountsPositive (latest-only) into a per-row check across the first 50 swaps. Mirrors the poolTypeIsValidEnum pattern (iterate- all-rows enum check at the indexer layer). Why both invariants exist: - swapAmountsPositive (LATEST only) — cheap probe; catches event-decoder bugs uniform across ALL swaps - THIS one (UP-TO-50 rows) — catches bugs that affect SUBSETS of swaps without affecting the latest Bug shapes caught (NOT caught by latest-only): - Indexer reorg re-processed historical blocks; latest fine, old rows wrong - Block-context-dependent decoder bug — reads "decimals" from pool's CURRENT state instead of swap's block, corrupting historical swaps from before a decimals change - Partial-rewrite bug — fix re-emitted only swaps from a specific block range with the corrupted shape - Pool-specific decoder bug — only swaps for one pool affected; latest happens to be a different pool Fixture extension: buildSwaps now defaults amountIn/amountOut to '1.0' for non-zero indices (index 0 still uses latestSwap* for back-compat). New per-row override knobs: swapAmountIns, swapAmountOuts arrays. 48 invariants now: 12 api-internal + 27 indexer + 9 chain. 160/160 smoke tests pass (4 new).

…e (49th invariant) First body-shape probe on /health. STRENGTHENS the existing apiHealth (status-code-only) into a body validation. Production /health (src/index.js line 54) emits { status: 'ok', timestamp: <ISO 8601> } — both fields matter to downstream ops. Why both invariants exist: - apiHealth (status-code-only) — catches "endpoint dead" outright - THIS one (body shape) — catches refactors that keep the endpoint serving 200 but change its body shape, silently breaking downstream consumers Bug shapes caught (NOT caught by apiHealth): - Refactor returns just the string 'ok' (not JSON body) - status field renamed to 'state' - status value changed ('healthy' instead of 'ok') — LB string- match health checks silently fail - timestamp dropped — ops dashboards parsing 'last-fresh' age silently break - timestamp emitted as Unix epoch number instead of ISO 8601 string — every ISO parser breaks - timestamp is a string but malformed ISO 8601 validation strategy: Date.parse() rather than a regex — robust enough to accept the canonical new Date().toISOString() format and reject typical malformed inputs. Fixture /health handler now defaults to production shape (was { ok: true }, now { status: 'ok', timestamp: <ISO> }). New knobs: healthStatus, healthTimestamp, healthBody (full-body override). 49 invariants now: 13 api-internal + 27 indexer + 9 chain. 164/164 smoke tests pass (4 new).

…bilityPresent (50th invariant) 🎯 50-invariant milestone. Tenth chain-layer invariant; third chain- CAPABILITY probe. COMPLETES the minimal capability TRIO that scenarios depend on: 1. impersonate → call function as arbitrary account 2. snapshot/revert → roll back state between tests 3. TIME-WARP → simulate "wait N seconds/days" (this slice) Without time-warp, ANY scenario involving a time-gated state transition (resolution after deadline, TWAP window calculation, vote-weight decay) cannot run at all — wall-clock waits would make CI runs hours-long. evm_setNextBlockTimestamp lineage: ganache-original method, supported by anvil + hardhat + ganache. Same support profile as evm_snapshot — wrong-fork clients (geth/erigon/reth) lack it. Bug shapes caught (NOT caught by impersonate / snapshot probes): - Anvil flag --no-storage-caching disabling time-warp specifically (snapshot can work while timestamp manipulation is broken) - RPC method-allowlisting blocking evm_setNextBlockTimestamp while allowing evm_snapshot/revert - Anvil version regression with new signature dropping legacy alias Side effect: probe sets next-block timestamp to now+86400. No block mined in the probe; effect only manifests if a scenario subsequently mines, which can override. Includes a "trio milestone test" that explicitly verifies all three capability probes pass on default fixture, documenting the trio as a coherent set. 50 invariants now: 13 api-internal + 27 indexer + 10 chain. 168/168 smoke tests pass (4 new). 1 fixture knob added (timeWarpSupported), 1 RPC dispatch case added (evm_setNextBlockTimestamp).

…e (51st invariant) Second body-shape probe in the catalog. Sister to apiHealthBodyShape (just shipped) — both extend a status-code-only invariant with body-shape validation. Together they cover the two main observability endpoints (/health + /warmer). Production /warmer (src/utils/warmer.js getWarmerStatus()) emits: { active, maxEntries, refreshIntervalSec, retentionDays, entries[] } All four numeric fields must be finite numbers. `active` may be 0 (warmer might have no entries yet); the three config fields must be > 0 (0 means "disabled" — a config regression). `entries` must be an array. Bug shapes caught (NOT caught by apiWarmer): - Refactor renames any of the four numeric fields (e.g., active → activeCount) — silent until ops gauges break - Numeric field emitted as string ('5' instead of 5) — consumers using strict typeof checks break - entries field changed to non-array (e.g., object keyed by id) — consumers iterating with .map() crash with "is not a function" - Body wrapped in a `data` field by middleware refactor - Config sentinel hit (refreshIntervalSec=0 = warmer disabled — silent regression that breaks the entire warming subsystem) Fixture /warmer handler now defaults to production shape (was { status: 'warm', queues: 0 }). New knobs: warmerActive, warmerMaxEntries, warmerRefreshIntervalSec, warmerRetentionDays, warmerEntries, warmerBody (full-body override). 51 invariants now: 14 api-internal + 27 indexer + 10 chain. 172/172 smoke tests pass (4 new).

…emaHasRequiredTypes (52nd invariant) First GraphQL INTROSPECTION probe in the catalog — qualitatively new dimension. All previous indexer probes query DATA (pools, swaps, candles); this queries the SCHEMA (__schema { types { name } }) to verify the entity types themselves still exist. The bug class this catches: schema regeneration renames a type (Candle → OHLCBar) or drops one entirely. Data probes hitting the renamed/dropped type return GraphQL errors like "Cannot query field 'candles'" — surfacing as misleading "indexer empty" diagnostics. This invariant catches the rename DIRECTLY with a clear "schema is missing required type(s): Candle" message, making triage take seconds instead of minutes. Bug shapes caught (NOT caught by data probes): - Schema regeneration renamed Pool → LiquidityPool / Candle → OHLCBar - A required type was DROPPED entirely from the schema - Schema introspection itself was disabled (some production GraphQL servers disable it for security) Required types asserted: Pool, Swap, Candle (the three entities the harness actually queries). Doesn't hard-pin every type — that would over-couple to the schema; pins ONLY the harness's actual dependencies. Fixture extension: candles-direct response now includes __schema: { types: [...] } by default. Knob candlesSchemaTypes lets tests override; setting it to null omits __schema entirely (simulates introspection-disabled servers). Indexer-side coverage now spans THREE qualitatively distinct dimensions: connectivity, data, and SCHEMA. Together they distinguish "indexer down" vs "indexer empty" vs "indexer schema regression" — three failure modes that previously all collapsed into "indexer failing somehow". 52 invariants now: 14 api-internal + 28 indexer + 10 chain. 176/176 smoke tests pass (4 new).

…hemaHasRequiredTypes (53rd invariant) Second GraphQL INTROSPECTION probe — sister to candlesIndexerSchemaHasRequiredTypes (just shipped), on the registry indexer. Symmetrically completes schema-validation coverage across BOTH indexers. Why a separate registry probe (not one combined invariant): - Registry and candles are SEPARATE Checkpoint deployments — they can be regenerated/migrated independently - Failure diagnostics stay precise: which indexer's schema regressed, not "one of them did" - Different required type names (registry = ProposalEntity/ Organization/Aggregator; candles = Pool/Swap/Candle) Required types: ProposalEntity, Organization, Aggregator. The three load-bearing registry entities — each referenced by other invariants (FK probes, aggregator pinning probe, registry adapter probes). Bug shapes caught (NOT caught by data probes): - Schema regen renamed ProposalEntity → Proposal (data probes return "Cannot query field 'proposalEntities'" — looks like indexer-empty) - Aggregator dropped entirely → aggregator-pinning probes silently fail with misleading errors - Introspection disabled on registry side (independent of candles side; sister probe still passes — demonstrates per-indexer diagnostic precision) Fixture extension: registry-direct response now includes __schema: { types: [...] } by default. Knob registrySchemaTypes lets tests override; null omits __schema entirely. Both indexers now have FULL coverage across THREE qualitative dimensions: connectivity, data, SCHEMA. Failure diagnostics triage to ONE of three modes (down / empty / schema-regressed) per indexer. 53 invariants now: 14 api-internal + 29 indexer + 10 chain. 180/180 smoke tests pass (4 new).

…owsNonNegative (54th invariant) Iterate-all-rows extension on the candle side. Sister to swapAmountsAllRowsPositive — symmetrically completes the iterate- all-rows pattern across the two main accumulator-bearing entities (swap amounts + candle volumes). Why both candleVolumesNonNegative AND this exist: - candleVolumesNonNegative (LATEST only) — cheap probe; catches aggregator bugs uniform across all candles - THIS one (UP-TO-50 rows) — catches bugs affecting SUBSETS of candles without affecting the latest Bug shapes caught (NOT caught by latest-only): - Indexer reorg re-processed historical periods; latest fine, old candles corrupted - Per-period decoder bug (aggregator reads pool token-decimals from CURRENT state instead of period snapshot, corrupting historical candles from before a decimals change) - Partial-rewrite bug — fix re-emitted only candles from a specific period range - Pool-specific aggregator bug — only candles for one pool affected; latest happens to be a different pool Fixture extension: buildCandles now defaults volumeToken0/1 to '1.0' for non-zero indices (index 0 still uses latestCandleVolume* for back-compat). New per-row override knobs: candleVolumeToken0s, candleVolumeToken1s arrays. The iterate-all-rows pattern is now SYMMETRIC: swap amounts AND candle volumes both have latest-only + iterate-all-rows coverage. Each pattern catches a distinct bug class (uniform-aggregator vs subset-corruption). 54 invariants now: 14 api-internal + 30 indexer + 10 chain. 184/184 smoke tests pass (4 new).

…Consistent (55th invariant) Third iterate-all-rows extension. COMPLETES the iterate-all-rows TRIAD on the indexer's main accumulator entities: 1. swapAmountsAllRowsPositive (8 slices ago) 2. candleVolumesAllRowsNonNegative (last slice) 3. candleOHLCAllRowsConsistent (this slice) Each accumulator entity now has BOTH latest-only + all-rows coverage: | Entity | Latest-only | All-rows | |--------|-------------|----------| | swap.amount{In,Out} | swapAmountsPositive | swapAmountsAllRowsPositive | | candle.volumeToken{0,1} | candleVolumesNonNegative | candleVolumesAllRowsNonNegative | | candle.{open,high,low,close} | candleOHLCOrdering | candleOHLCAllRowsConsistent (NEW) | Each pair catches uniform-aggregator bugs (latest) AND subset- corruption bugs (all-rows). Bug shapes caught (NOT caught by latest-only): - Per-period min/max accumulator bug — historical candles initialized differently (running-min reset to 0 instead of +Infinity for periods where the first swap > 0) - Indexer reorg corrupted historical OHLC fields - Pool-specific aggregator bug — only candles for one pool affected - Period-boundary off-by-one — a swap counted in the wrong period had a price outside the window's bounds Fixture extension: buildCandles now defaults OHLC to consistent values on every row (open=close=0.5, high=0.6, low=0.4) for non- zero indices. Index 0 still uses latestCandle* for back-compat. New per-row override knobs: candleOpens, candleHighs, candleLows, candleCloses arrays. 55 invariants now: 14 api-internal + 31 indexer + 10 chain. 188/188 smoke tests pass (4 new).

…er ergonomics script Tooling slice (no new invariant). At 55 invariants the dry-run flat catalog is hard to scan. Adds `npm run scenarios:by-layer` that prints invariants grouped by layer, with both a summary table (layer + count + bar-chart) AND per-layer detail blocks. What it answers at a glance: - "What does the chain layer cover?" → orchestrator↔chain block lists all 10 chain probes - "Which probes cross to the candles indexer?" → api↔candles block lists 4 names - "Where's the catalog growing fastest?" → bar-chart shows orchestrator↔candles is densest at 21 Authoritative layer breakdown surfaced (corrects an earlier inconsistency in status-line bucketing): api 10 api↔candles 4 api↔registry 2 orchestrator↔candles 21 orchestrator↔chain 10 orchestrator↔registry 8 ---- 55 Implementation: 35-line scripts/scenarios-by-layer.mjs that imports INVARIANTS, groups by `layer` field, prints text. No flags, no colors, deliberately scriptable (pipe into grep/awk for filtering). Same import style as the existing dry-run output but reorganized. Smoke test: 1 new (asserts header line, summary table format, per-layer detail sections, sanity-check that the chain-CAPABILITY trio names appear under the chain-layer block). 189/189 smoke tests pass (was 188).

…lForwardsIntrospection (56th invariant) First api-layer introspection-passthrough probe. Sister to registryIndexerSchemaHasRequiredTypes (DIRECT-side). Same __schema query, but routed through the API LAYER instead of direct. The bug class: many production GraphQL proxies (Apollo Gateway, Hasura, etc.) ship with introspection disabled by default at the proxy layer for security — even when the upstream indexer supports it. If a deploy accidentally turns on that toggle, harness scenarios that introspect through the api layer silently break, BUT the DIRECT-side sister still passes — making the actual cause hard to find without this distinct probe. Diagnostic-precision pattern (api+direct cross-check): api ✓ direct ✓ → both layers fine api ✗ direct ✓ → API PROXY STRIPPED INTROSPECTION api ✓ direct ✗ → indexer schema regressed (api correctly forwarded the broken schema) api ✗ direct ✗ → indexer is root cause Each combination has a distinct error message so engineers can read the combined signal without guessing which layer broke. Fixture extension: api /registry/graphql handler now includes __schema in its passthrough by default (mirroring direct). New knob apiRegistryStripsIntrospection (default false) simulates proxy-layer disablement. The api↔registry layer (previously thinnest at 2 invariants) is now at 3. 56 invariants now: 10 api + 4 api↔candles + 3 api↔registry + 21 orchestrator↔candles + 8 orchestrator↔registry + 10 orchestrator↔chain. 193/193 smoke tests pass (4 new).

…ForwardsIntrospection (57th invariant) Sister to apiRegistryGraphqlForwardsIntrospection (just shipped) on the candles side. COMPLETES the introspection-coverage MATRIX: | Direct-side | API-side | --------------+-------------+----------+ registry | ✓ | ✓ | candles | ✓ | ✓ | ← this slice All four probes are now in the catalog. For ANY introspection failure, the diagnostic combines layer (api/direct) × indexer (registry/candles) into a precise root-cause statement. Bug class beyond the registry sister: per-route proxy config drift. The candles route can be misconfigured INDEPENDENTLY of the registry route (separate proxy configs are common in production GraphQL gateways). Pairing the two api-layer probes catches that drift: apiRegistry ✓ apiCandles ✓ → proxy fine on both routes apiRegistry ✗ apiCandles ✓ → registry route stripped only apiRegistry ✓ apiCandles ✗ → candles route stripped only (drift) apiRegistry ✗ apiCandles ✗ → proxy-wide lockdown Fixture extension: api /candles/graphql handler now includes __schema in its passthrough by default (mirroring direct + registry). New knob apiCandlesStripsIntrospection (separate from the registry knob so per-route drift is testable). 57 invariants now: 10 api + 5 api↔candles + 3 api↔registry + 21 orchestrator↔candles + 8 orchestrator↔registry + 10 orchestrator↔chain. 197/197 smoke tests pass (4 new).

Sister to interface-side scenarios-catalog. Ships a committed Markdown index of the orchestrator's 57 invariants — browsable on GitHub without running anything — plus drift detection against doc rot. What ships: - scripts/invariants-catalog.mjs (~85 lines): imports INVARIANTS, validates {name, description, layer}, groups by layer, emits orchestrator/INVARIANTS.md - orchestrator/INVARIANTS.md (~110 lines, committed): title + per-layer summary + per-layer detail tables - npm script invariants:catalog wired - tests/smoke-invariants-catalog.test.mjs: drift smoke test mirroring interface-side smoke-scenarios-catalog (snapshot → regen → byte-identical assertion → restore in finally; "added an invariant but forgot to regen" becomes a CI failure with a fix-command pointer) Why now: with 57 invariants across 6 layers (api, api↔candles, api↔registry, orchestrator↔candles, orchestrator↔chain, orchestrator↔registry), coverage- by-script is insufficient. Reviewers can now read the catalog on GitHub without cloning. Future tooling (CI dashboards, coverage gap reports) gets a readable machine-friendly source. Validation: 1/1 new smoke test passes in isolation. Full suite: pre-existing daemon-dependent flake on Phase 2 slice 4 port-leak test, no regressions caused by this slice.

Two new root-level aliases so the catalog scripts shipped in recent slices are invocable from the repo root, not just from auto-qa/harness/: - auto-qa:e2e:scenarios:by-layer (slice 4d-by-layer-script) - auto-qa:e2e:invariants:catalog (slice 4d-invariants-catalog) Pure-additive package.json change. Each alias verified to resolve from the repo root and produce expected output. Pairs with an analogous interface-side commit wiring auto-qa:e2e:scenarios:by-route at that repo's root.

…aged CI The staged api smoke workflow (3e) currently runs only smoke:scenarios + a dry-run catalog sanity check — it does NOT cover the new smoke-invariants-catalog.test.mjs shipped two iterations ago. Extends auto-qa-harness-smoke.yml.staged with two new steps mirroring the interface-side scenarios:catalog drift pattern (slice 3a): - Regenerate invariants catalog (npm run invariants:catalog) - Verify invariants catalog is in sync (git diff --exit-code) Without this, an invariant added without regenerating INVARIANTS.md would silently drift in CI. Validation: YAML re-parsed clean via js-yaml@4; drift assertion pre-verified locally (regen + git diff exits 0). Trigger remains workflow_dispatch.

…link smoke Phase 0 CHECKLIST item 41 ("Sister-link verified: fresh checkout of both repos in ~/code/futarchy-fi/") bundles doc + docker checks. This slice ships the doc-side half. Adds tests/smoke-architecture-sync.test.mjs that: - Resolves sister ARCHITECTURE.md at ../interface/auto-qa/harness/ARCHITECTURE.md (4 levels up from the test file) - Skips cleanly via t.skip() if sister not present (CI runners, one-repo clones) - Asserts byte-identical otherwise — fails loudly if the shared spec drifted between repos Sister test on the interface side mirrors this in reverse (looks up futarchy-api-side ARCHITECTURE.md). Baseline verified byte-identical; both tests pass 1/1 in isolation. CHECKLIST item 41 gains a sub-bullet recording the doc- side coverage; docker-side half remains unchecked (daemon-required).

…workflow STAGED Cross-repo complement to the previous-iteration smoke test (tests/smoke-architecture-sync.test.mjs). Together they give complete drift coverage of the shared ARCHITECTURE.md spec: - Smoke test: catches dev-with-sibling-clone case at npm test time. Skips if sister not present. - Workflow: catches CI-with-one-repo-checked-out case. Curls sister via raw.githubusercontent.com (public), diffs against local. Fails loudly on byte mismatch. Adds auto-qa/harness/ci/auto-qa-harness-architecture-sync.yml.staged pointing at the interface sister. Optional sister_branch input (default auto-qa; switch to main post-merge). Trigger: workflow_dispatch only. Sister-side workflow on the interface repo mirrors this in reverse (looks up futarchy-api). Validation: - YAML re-parsed clean via js-yaml@4 - Sister raw URL returns HTTP/2 200 (publicly accessible) - Simulated workflow locally: curl + diff → PASS ci/README.md staged-table updated (2 rows now); promote order documented (smoke first, then this).

… in staged CI Two new daemon-free smoke files shipped earlier this session (smoke-invariants-catalog.test.mjs, smoke-architecture-sync.test.mjs) were not exercised by the api-side staged CI workflow, which runs only smoke:scenarios. Adds explicit steps for each in auto-qa-harness-smoke.yml.staged: - Run invariants-catalog smoke test (unit-level drift assertion, sister to the workflow-level git-diff check shipped in slice 3e-extend) - Run architecture-sync smoke test (Phase 0 doc-side sister-link check; SKIPS cleanly in CI's single-repo checkout — the cross-repo workflow handles actual sister drift) Why explicit steps vs broadening to npm test: the api harness has 9 daemon-required smoke files (anvil + docker + indexers) that would fail in CI without that infra. Validation: YAML re-validated via js-yaml@4; both tests pass 1/1 in isolation. Trigger remains workflow_dispatch.

krandder added 30 commits May 10, 2026 02:08

auto-qa: add README orientation page

4fa4cad

krandder added 30 commits May 10, 2026 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

auto-qa: add forked-replay harness (57 invariants, 197 smoke tests, fixture-only CI)#10

auto-qa: add forked-replay harness (57 invariants, 197 smoke tests, fixture-only CI)#10
krandder wants to merge 141 commits into
mainfrom
auto-qa

krandder commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

krandder commented May 10, 2026

Summary

What's NOT in this PR

Layer breakdown (per `scenarios:by-layer`)

Coverage patterns shipped

Test plan

Post-merge tasks (maintainer)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant