Skip to content

auto-qa: add forked-replay harness (57 invariants, 197 smoke tests, fixture-only CI)#10

Open
krandder wants to merge 141 commits into
mainfrom
auto-qa
Open

auto-qa: add forked-replay harness (57 invariants, 197 smoke tests, fixture-only CI)#10
krandder wants to merge 141 commits into
mainfrom
auto-qa

Conversation

@krandder

Copy link
Copy Markdown
Contributor

Summary

Adds the auto-qa forked-replay harness as a self-contained subdirectory. Built incrementally over many iterations; all changes confined to auto-qa/ except for an additive block of auto-qa:* scripts in root package.json.

  • 57 invariants in the orchestrator catalog (run npm run scenarios:by-layer for a layer-grouped view)
  • 197/197 smoke tests passing against an in-process node:http fixture (Node 22 native test runner)
  • One staged CI workflow awaiting promotion: auto-qa/harness/ci/auto-qa-harness-smoke.yml.staged (workflow_dispatch only for v1; ~1 min runtime; no docker, no real services)

What's NOT in this PR

  • No production code changes. Only package.json is touched outside auto-qa/, and the diff is a pure addition of auto-qa:* script aliases (zero modifications to existing scripts or deps).
  • No real-services validation. The 197 tests run against the fixture, not anvil/indexer/api. The 4a-verify / 4b-verify / 4c-verify slices in auto-qa/harness/CHECKLIST.md exist exactly to close that gap; they need a Docker daemon (deferred to post-merge maintainer work).
  • CI not active yet. The smoke workflow file ships as .staged because GitHub blocks OAuth Apps without workflow scope from writing .github/workflows/. After merge, copy the .staged file into .github/workflows/ (instructions in auto-qa/harness/ci/README.md).

Layer breakdown (per `scenarios:by-layer`)

Layer Count
api 10
api↔candles 5
api↔registry 3
orchestrator↔candles 21
orchestrator↔registry 8
orchestrator↔chain 10
Total 57

Coverage patterns shipped

  • Chain CAPABILITY trio — impersonate + snapshot + time-warp probes (the minimal scenario primitive set; without these, scenarios silently fail)
  • Iterate-all-rows triad — swap amounts + candle volumes + candle OHLC (latest-only + all-rows pairs catch uniform-aggregator vs subset-corruption bugs distinctly)
  • Body-shape probes — /health + /warmer (catches LB string-match breakage + ops-dashboard format regressions invisible to status-code-only checks)
  • Schema introspection MATRIX — DIRECT × API × {candles, registry} = 4 probes (any introspection failure pinpoints layer × indexer via a four-probe truth table)
  • Header probes — X-Cache, X-Response-Time, X-Cache-TTL on /api/v2/.../chart

Test plan

  • CI: smoke tests pass (will need to run via `workflow_dispatch` after promoting the staged file)
  • Local: `cd auto-qa/harness && npm ci && npm run smoke:scenarios` → 197/197 pass
  • Local: `cd auto-qa/harness && npm run scenarios:by-layer` shows the 57-invariant layer table
  • Local: `cd auto-qa/harness && HARNESS_COMPOSE=1 HARNESS_DRY_RUN=1 node orchestrator/scenario-runner.mjs` lists every invariant with its layer + description
  • (optional) Daemon-required: `docker compose up -d` from `auto-qa/harness/` brings the full stack — the `4e` acceptance gate; out of scope for this PR

Post-merge tasks (maintainer)

  1. Promote `auto-qa/harness/ci/auto-qa-harness-smoke.yml.staged` into `.github/workflows/auto-qa-harness-smoke.yml`
  2. Smoke-test it via the Actions UI (`workflow_dispatch`)
  3. After it's green, add a `pull_request: paths: ['auto-qa/harness/**']` trigger so harness-touching PRs gate on it
  4. Re-evaluate the daemon-required `*-verify` slices once Docker is available

🤖 Generated with Claude Code

krandder added 30 commits May 10, 2026 02:08
Initial setup for the /loop auto-QA initiative on this repo. Adds
auto-qa/PROGRESS.md as the working ledger:

  - Methodology + status snapshot
  - Full PR ledger #1#9 (entire repo history): 7 bug-fix, 1 feature,
    1 infra. Each bug-fix has hypothesis + ideal test + tools needed.
  - Tooling backlog ranked by leverage:
      1. GraphQL passthrough contract test (4/9)
      2. Unified-chart snapshot test (2/9)
      3. Path-prefix dual-form test (1/9 — trivial first build)
      4. Field-semantics property walker (#9 + future)
  - Open question recorded: pick node --test vs vitest. Default to
    node --test for minimal surface.

No production code touched.
First test landed: auto-qa/tests/path-prefix.test.mjs covers PR #1
(the /charts path-prefix-strip middleware). Two cases, both green
against the live api.futarchy.fi:

  ✔ GET /charts/<path> ≡ GET /<path> (same JSON envelope)
  ✔ GET /charts/health ≡ GET /health (status 200 from both)

Test runner: node --test (built-in, zero deps). Glob
'auto-qa/tests/**/*.test.mjs'. Run with:

  npm run auto-qa:test

The fixture proposal is GIP-150 v2 (0x1a0f209f…) over a pinned
historical time window for reproducibility. Tests skip cleanly when
api.futarchy.fi is unreachable so the suite stays non-flaky.

PROGRESS.md updated: status snapshot bumped (1 test landed-passing,
runner resolved), PR #1 row marked landed-passing, open question on
test framework removed.

No production code touched.
Adds auto-qa/tests/passthrough-contract.test.mjs — 7 cases covering
the GraphQL passthrough behavior fixed in 4 of this repo's 9 PRs:

  PR #4: scalar pool: "0x…" filter on candles (chain prefix translated)
  PR #7: scalar proposal: "0x…" filter (chain prefix translated)
  PR #8: pool_in / proposal_in / id_in array filters
  PR #9: periodStartUnix preserved as snapped boundary
         (asserts ts % 3600 === 0 for every returned candle)
  Cross: response IDs always come back without chain prefix

All 9 auto-qa tests now green (2 path-prefix + 7 passthrough-contract).

Strategy used: each test makes a real HTTP call to api.futarchy.fi
with a stable fixture (GIP-150 v2 = 0x1a0f209f…), checks both that
the call succeeds and that the documented invariant holds. This is
the highest-leverage suite in the auto-qa backlog (4/9 bugs covered
by a single file).

Tests skip cleanly if the live API is unreachable, so the auto-qa
suite stays non-flaky.

PROGRESS.md updated: status snapshot now 9 tests landed-passing,
PRs #4/#7/#8/#9 marked landed-passing in the ledger.

No production code touched.
…ignment)

Adds auto-qa/tests/unified-chart.test.mjs — 4 cases against
GET /api/v2/proposals/:id/chart for the GIP-150 v2 fixture:

  PR #5 — both conditional_yes.price_usd and conditional_no.price_usd
          are positive numbers (covers the CONDITIONAL > EXPECTED_VALUE
          > PREDICTION fallback chain). Also asserts pool_id is a plain
          address (not chain-prefixed).
  PR #6 — company_tokens.base.tokenSymbol is "GNO" and explicitly NOT
          "PNK" (the legacy hardcoded fallback). currency.tokenSymbol or
          stableSymbol is set.
  Bonus — candles.yes / candles.no are non-empty arrays for the pinned
          window, every periodStartUnix is snapped to a 3600s boundary,
          every close parses to a positive number.
  Bonus — volume reported in human units (< 1e15), not raw wei. If
          unit normalization regresses, this fires.

All 13 auto-qa tests now green:
  - 2 path-prefix.test.mjs       (PR #1)
  - 7 passthrough-contract.test.mjs (PRs #4/#7/#8/#9)
  - 4 unified-chart.test.mjs        (PRs #5/#6)

Coverage: 7 of 9 PRs in the repo's history, all of which are bug-fixes.
Remaining: PR #2 (rpc-proxy infra, would need mock upstream) and
PR #3 (passthrough scaffold feature, implicitly covered by all the
filter tests).

No production code touched.
Adds auto-qa/tests/multi-proposal-smoke.test.mjs — iterates over 3
diverse proposal fixtures and asserts the chart endpoint returns a
valid contract envelope for each:

  - GIP-150 v2 (GNO/sDAI, fully indexed)
  - TSLA Mega Package (TSLAon/USDS)
  - Circle native USDC on Gnosis (USDC/sDAI)

Asserts shape only (HTTP 200 + envelope keys present + symbols are
non-empty strings + never the legacy "PNK" fallback). Data quality
(price > 0, candles non-empty) is the unified-chart.test.mjs job for
the canonical fixture.

All 16 api auto-qa tests now green:
  - 2 path-prefix
  - 7 passthrough-contract
  - 4 unified-chart
  - 3 multi-proposal-smoke  ← NEW

Surfaced (and documented in PROGRESS.md, per /loop directive — NOT
fixed): TSLA Mega Package and CIP-82 both return zero prices and
fall through to the "TOKEN" default base symbol. Likely missing
CONDITIONAL pools — worth investigating in a real fix-pass.
Adds auto-qa/tests/spot-candles.test.mjs — 3 cases pinning the
contract of GET /api/v1/spot-candles, the third major endpoint
on this service that wasn't previously covered:

  1. Missing `ticker` query param → 400 with `error` field.
  2. Unknown ticker → 200 with `{ spotCandles: [] }` envelope.
  3. When any candidate ticker returns data, every candle satisfies
     the documented `{ periodStartUnix: <unix-ts>, close: <number> }`
     shape.

Test #3 is intentionally tolerant: it tries a few candidate tickers
and skips (not fails) if none return data. The underlying source
(futarchy-spot or GeckoTerminal) varies and pinning a specific
ticker fixture would rot.

API smoke-coverage now spans all three surfaces:
  /api/v2/proposals/:id/chart  ← unified-chart.test.mjs + multi-proposal-smoke
  /candles/graphql             ← passthrough-contract.test.mjs (PRs #4/#7/#8/#9)
  /api/v1/spot-candles         ← NEW

Test counts: 19 (18 passing, 1 skipped — the shape check, by design).
No production code touched.
Adds auto-qa/tests/indexer-freshness.test.mjs — 3 cases that compare
each Checkpoint indexer's head block to the live Gnosis chain tip
and assert the lag stays under a threshold:

  candles (gnosis): < 5000 blocks  (~7h @ 5s/block)
  registry:         < 15000 blocks (~21h — registry runs further
                                    behind in normal ops)

Plus a sanity case that both heads are positive numbers > 30M (real
Gnosis blocks).

Snapshot at iteration creation:
  - candles gnosis:   870 blocks behind  (healthy)
  - registry:        7916 blocks behind  (degraded but within threshold)

This is a real ops invariant — earlier in this session we caught the
registry indexer drifting to ~5000 blocks behind via the status page.
Now there's a test for that. Skips gracefully if API or Gnosis RPC
unreachable.

All 22 api auto-qa tests now pass (21 pass + 1 skipped by design).
Adds auto-qa/tests/registry-org-shape.test.mjs — 4 cases that validate
the Organization entity shape returned by the registry indexer:

  ✔ at least one org returned (catastrophic-empty guard — catches
    full table wipe / resync failure that would empty the Companies
    page upstream of the frontend)
  ✔ every org has {id, name, owner, metadata}
  ✔ every non-null metadata is parseable JSON
  ✔ at least one org metadata uses archived/visibility flags
    (PR #61 filter coverage diagnostic — emits counts so we know the
    filter is being exercised by real data)

Cross-cutting catch for the bug family that landed as `interface`
PR #61 (Companies page rendering empty after Checkpoint migration).
If the registry indexer ever returns zero orgs again or the
metadata field changes shape, this test fires before users see
the symptom.

Diagnostic at iteration time: 3/7 orgs use archived flag — the
filter IS exercised by real data.

All 26 api auto-qa tests now pass (25 pass + 1 skipped by design).
Adds auto-qa/tests/legacy-v1-prices.test.mjs — 6 cases covering an
API surface that wasn't tested before:

  GET /api/v1/market-events/proposals/:id/prices
  (the predecessor to /api/v2/.../chart, still used by some clients)

Cases:
  ✔ HTTP 200 + documented envelope keys
  ✔ conditional_yes/no expose price_usd + pool_id (positive number,
    plain address)
  ✔ company_tokens.base.tokenSymbol resolves (not legacy "PNK" fallback)
  ✔ v1 vs v2 cross-check: same base symbol + same YES pool_id
    (catches the v1 and v2 paths drifting apart)
  ✔ response time < 5s for v1 (perf bound — catches warmer regression)
  ✔ response time < 5s for v2 (perf bound)

The cross-check is the most leveraged test — both endpoints serve
the same underlying market, so any logic-divergence between the
v1 and v2 codepaths trips this test before users see inconsistencies.

All 32 api auto-qa tests now pass (31 pass + 1 skipped by design).
Adds auto-qa/tests/operational-endpoints.test.mjs — 3 cases pinning
the contract of the two operational endpoints not previously covered:

  ✔ /health returns 200 with {status, timestamp} and timestamp is fresh
    (catches edge-cache pinning that would freeze liveness checks)
  ✔ /warmer returns {active, entries[]} with consistent counts
  ✔ /health timestamp advances between consecutive calls (1.5s apart)
    (catches edge cache regression on the health endpoint)

These are the surfaces status.futarchy.fi and any uptime monitor
depend on — if /health goes stale or /warmer crashes silently, this
test surfaces it before users notice a stale dashboard.

All 35 api auto-qa tests now pass (34 + 1 skipped by design).
Pins the GraphQL passthrough surface itself, independent of any
user-defined schema. Catches a layer the existing schema-shape tests
would miss with a more confusing error message:

  - Cloud Run revision shipped without the route mounted
  - Upstream Checkpoint indexer entirely unreachable
  - HTTPS termination broken
  - Reverse-proxy stripping the request body
  - Error envelope shape changes that break clients branching on
    `response.errors[0].message`

Coverage:
  - { __typename } returns "Query" on both endpoints
  - { __schema { queryType { name } } } returns a non-empty type name
  - Malformed query yields { errors: [{ message: <string> }] } body
  - GET is rejected (POST-only surface, must NOT return 200)
  - 12 parallel introspections complete cleanly (no shared mutex)

Surfaced inconsistency (pinned, NOT fixed per directive):
  /candles/graphql returns HTTP 502 on parse errors
  /registry/graphql returns HTTP 400 on the same input
  502 misclassifies a client error as server failure. Pinned in
  PARSE_ERROR_STATUS so a deliberate unification surfaces as a
  test update.

PR coverage: 7/9 -> 8/9 (only #2 RPC infra remains).
Tests: 35 -> 46 (api), 103 -> 114 cross-repo. All green.
Pins the cross-origin contract every browser-side caller depends on:
the frontend at futarchy.fi, staging frontends, Apollo Client, and the
Snapshot widget at snapshot.box. Not tied to any single PR — defensive
against a class of regressions:

  - cors() middleware accidentally dropped from a route
  - Stricter origin allowlist excludes futarchy.fi
  - Apollo-Require-Preflight no longer in allow-headers
  - X-Cache / X-Response-Time stops being exposed (silently zeros
    the frontend's cache-hit instrumentation)

Coverage:
  - 5 endpoints × 3 representative origins (futarchy.fi, staging,
    snapshot.box) preflight matrix
  - POST + REST GET responses also carry CORS headers (not just
    preflight)
  - Allow-Headers includes Content-Type AND Apollo-Require-Preflight
  - Allow-Methods includes POST for both passthroughs
  - Expose-Headers includes X-Cache and X-Response-Time
  - Pinned-policy ratchet: today's policy is wildcard origin, test
    fires loudly if we tighten so REPRESENTATIVE_ORIGINS gets updated

Tests: 46 -> 67 (api), 125 -> 146 cross-repo. All green.
Pins the X-Cache observability instrumentation on the chart endpoint
(/api/v2/proposals/:id/chart) — the only hot read path with a cache.
Catches a class of regressions that otherwise only surface as latency
slowly rising in production:

  - Cache layer silently disabled (X-Cache header missing → frontend
    cache-hit dashboard goes blind, no obvious user impact until p99
    latency rises)
  - X-Cache returns garbage instead of literal HIT or MISS
  - X-Cache-TTL drifts to 0 (every call cold) or unbounded (stale data)
  - X-Response-Time format breaks (frontend dashboard math goes NaN)
  - Cache key includes a non-deterministic component (back-to-back
    requests both MISS, throughput collapses)
  - HIT path silently degraded (HIT requests no longer < 100ms)

Today's measurements: TTL=13s, HIT=0ms, MISS≈30-160ms. Test loosens
those bounds to defensible ceilings (TTL <= 24h, HIT <= 100ms).

Tests: 67 → 73 (api), 152 → 158 cross-repo. All green.
Pins the boundary semantics of GET /api/v2/proposals/:id/chart around
its minTimestamp / maxTimestamp params. Catches a class of quiet bugs
where the endpoint returns data outside the requested window — the
chart silently shows wrong candles with no visible symptom unless
someone manually inspects timestamps.

Coverage:
  Degenerate-window graceful handling (must NOT 5xx):
    - inverted window (max < min) → 200 + empty candles
    - far-future window → 200 + empty candles
    - far-past window → 200 + empty candles
    - missing both timestamps → 200 + default window applied
    - negative timestamps → 200 (defensive)

  Window-respect invariants on a known-good window:
    - every returned candle satisfies min <= periodStartUnix <= max
    - candles strictly ascending by periodStartUnix in each series
    - shape contract: {periodStartUnix, close} both parse as numbers
    - 1-second window between known candles returns at most 1 per series
      (period-snapping invariant from api PR #9)

Defensive against:
  - Window predicate flipped (>= ↔ <=) in the indexer query
  - Sort order inverted in a refactor
  - Default-window logic returning unbounded data on missing params
  - Inverted/future/past windows crashing the Checkpoint passthrough

Tests: 73 → 82 (api), 174 → 183 cross-repo. All green.
Pins POST /subgraphs/name/algebra-proposal-candles-v1 — a backward-
compat shim that proxies to the same upstream as /candles/graphql but
also injects spotCandles: [] into the response. Older clients
(snapshot-labs/sx-monorepo, pre-Cloud-Run integrations) still hit this
URL pattern; removing it would silently 404 them — same class of bug
as PR #1 (/charts prefix lost).

Coverage:
  - POST { __typename } returns 200 + Query
  - GET is rejected (POST-only surface)
  - spotCandles injection invariant on the legacy route
  - Negative confirmation: modern /candles/graphql does NOT inject
    spotCandles (the two routes have distinct contracts; if both
    start or both stop injecting, they've drifted)
  - Cross-route parity: real candles(...) query returns same shape
    + same row count from both routes
  - Malformed query yields the standard errors[] envelope

API surface coverage: 3/4 → 4/4.
Tests: 82 → 88 (api), 188 → 194 cross-repo. All green.
Pins the type contract of /api/v2/proposals/:id/chart's market block.
The frontend branches on heterogeneous types — price_usd as number
(JSON-native) vs volume as string (preserves 18-decimal precision
from the indexer) — and any "normalization" refactor that homogenizes
either type breaks parsing.

Coverage:
  Type heterogeneity (the subtle invariant):
    - market.{conditional_yes,conditional_no}.price_usd → number, finite, > 0
    - market.volume.{cy,cn}.volume / volume_usd → string, parses positive
    - volume.{cy,cn}.status === "ok" for healthy fixture

  Address shape:
    - event_id == requested proposal id (exact lowercase match)
    - trading_address == event_id (single-trade-address invariant)
    - all pool_ids match /^0x[a-f0-9]{40}$/ (chain prefix stripped)
    - YES pool_id ≠ NO pool_id (catches pool-resolution collapse)

  Timeline + chain:
    - timeline.start, end are integer unix ts in 2020-2050 range
    - timeline.start <= timeline.end
    - timeline.chain_id === 100 (Gnosis pin)

  Tokens (sharper than existing PR #6 test):
    - company_tokens.base.tokenSymbol non-empty AND not "TOKEN"
      fallback (catches pool-resolution priority chain regression
      on the canonical healthy fixture)

Tests: 88 → 99 (api), 213 → 224 cross-repo. All green.
Pins how /api/v2/proposals/:id/chart treats four input classes:
canonical lowercase, uppercase (clients sometimes send checksummed),
zero address, and garbage/path-traversal/oversized strings.

Today's behavior is permissive (every input → 200, with empty/fallback
data for non-existent proposals). The test pins that permissiveness
so a future input-validation patch surfaces as a deliberate API
change requiring client coordination.

Coverage:
  Case-insensitive lookup (the most important invariant):
    - uppercase request returns same data as lowercase (same pool_id,
      same token symbol — catches case-sensitive lookup leak)
    - event_id normalized to lowercase in response

  Zero-address graceful degradation:
    - 200 status, prices=0, "TOKEN" fallback symbol

  Garbage-input safety (must NOT 5xx):
    - non-hex string ("0xnotahexvalue") → 2xx
    - too-short string ("shortaddr") → 2xx
    - path-traversal payload ("../etc/passwd") → 2xx + JSON body
      (defensive: must NOT pass through to upstream as a substring)
    - very long id (502 chars) → < 500 status

Tests: 99 → 107 (api), 244 → 252 cross-repo. All green.
Pins the three pure helpers in src/adapters/candles-adapter.js that
underpin the entire Checkpoint passthrough translation:

  stripChainPrefix(id)             "100-0xabc" → "0xabc"
  addChainPrefix(id, chainId=100)  "0xabc"     → "100-0xabc"
  stripPrefixesAndNormalize(value) recursive walker over response objects
  CHAIN_PREFIXED_RE                /^\d+-0x[a-fA-F0-9]{40}$/

Every PR #4/#7/#8/#9 fix relied on these being correct. A regression
in any one of them returns wrong data for every passthrough query (or
worse: 200 with no data, which looks normal until users complain).

Coverage:
  stripChainPrefix:
    - "100-<addr>" → bare addr; works for chains 1, 137, etc
    - bare addr → unchanged (idempotent)
    - null/undefined/"" passed through
    - composite IDs ("1-<addr>-3600-<ts>") strip only leading segment

  addChainPrefix:
    - bare addr → "100-<addr>" (default chain)
    - custom chainId
    - already-prefixed input NOT double-prefixed (critical idempotency)
    - null/undefined/"" passed through
    - round-trip: stripChainPrefix(addChainPrefix(addr)) === addr

  CHAIN_PREFIXED_RE pattern:
    - matches valid forms (mixed-case hex)
    - rejects bare addrs, composite IDs, partial matches, non-hex,
      wrong-length, leading/trailing extras

  stripPrefixesAndNormalize walker:
    - top-level object fields
    - nested objects (recursion)
    - arrays (preserves order + length)
    - leaves non-matching strings intact (composite IDs, URLs,
      numeric strings)
    - handles primitives + nullish at any depth
    - idempotent (run twice = same output)

Tests: 107 → 128 (api), 262 → 283 cross-repo. All green.
Pins src/utils/token-from-pool.js — the pool-name → company/currency
symbol resolver that PR #6 fixed. The PR-#6 unified-chart test pins
the end-to-end "no PNK leak" property; this test pins the function
directly so a regression to the priority chain or pattern matching
surfaces with a clear message instead of a downstream "TOKEN" fallback
that could be mistaken for an indexer issue.

Coverage:
  Empty / invalid:
    - empty array, non-array, no-recognized-type all → both null

  Each pool type, happy path:
    - CONDITIONAL "YES_GNO / YES_sDAI" → company=GNO, currency=sDAI
    - NO_ prefix on either side accepted
    - EXPECTED_VALUE "YES_GNO / sDAI" → company=GNO, currency=sDAI
    - PREDICTION degenerate symmetry "YES_sDAI / sDAI"
      → company=null, currency=sDAI

  Priority chain (heart of PR #6's fix):
    - CONDITIONAL beats EXPECTED_VALUE even when EV is first in array
    - EXPECTED_VALUE beats PREDICTION when CONDITIONAL absent
    - falls all the way through to PREDICTION when no others present

  Defensive:
    - pools with no name field → skipped, fallback to next
    - null/undefined entries in array → skipped (no throw)
    - unrecognized name format → both null
    - whitespace tolerance around the slash separator
    - symbol \\w class permits digits + underscores

  Anti-PNK regression check (PR #6's whole point):
    - none of the empty/unknown/malformed paths return "PNK"

Tests: 128 → 144 (api), 294 → 310 cross-repo. All green.
Pins src/utils/cache.js — the in-memory TTL cache backing the
response/registry/candles/spot caches. Subtle behaviors that
regressions can break silently:

  - TTL expiry: get() must return undefined AND delete the entry
    (not just lazy-skip)
  - Hit/miss counters increment exactly once per get(); expired-
    entry get counts as MISS
  - clear() resets both store AND counters (not just store)
  - set() resets the TTL clock for the key

Coverage:
  get/set basics:
    - set then get returns value
    - get on missing key returns undefined
    - set overwrites

  Counter accuracy:
    - hit increments hits, miss increments misses
    - interleaved counts independent

  TTL expiry (with 30-50ms TTLs for fast tests):
    - entry expires after ttlMs
    - expired-entry get is a MISS
    - expired-entry get DELETES from store
    - set() resets the TTL clock

  stats() formatting:
    - 0% with no calls
    - integer percent rounding
    - entry count from store.size

  clear():
    - empties store + resets both counters

  cache-config defaults pinned via source-file regex:
    - RESPONSE_TTL_SEC default = 13s
    - REGISTRY_TTL_SEC default = 300s
    - WARMER_INTERVAL_SEC = max(RESPONSE_TTL - 3, 5) formula

Tests: 144 → 161 (api), 357 → 374 cross-repo. All green.
Pins src/config/endpoints.js — the single switch routing the api
between Graph Node (legacy AWS, dead) and Checkpoint (post-AWS-GCP
target). A regression that flips the default mode back to graph_node,
removes the BROKEN_ footgun prefix, or drifts localhost ports breaks
adapter calls silently.

Coverage:
  MODE handling:
    - default is "checkpoint" (post-AWS-migration target)
    - lowercased so case-insensitive env vars work
    - allowlist is exactly graph_node + checkpoint
    - unknown MODE warns AND falls through to GRAPH_NODE (the
      pre-existing "if not exactly checkpoint then GRAPH_NODE" logic
      — pinned so any "fix" is deliberate)

  GRAPH_NODE footgun deterrent:
    - registry + candles URLs both prefixed with
      BROKEN_GRAPH_NODE_DO_NOT_USE://  (post-AWS-migration intentional
      breakage so accidental routing fails fast with DNS error)
    - legacy AWS CloudFront host pinned in the URL

  CHECKPOINT defaults:
    - registry localhost port = 3003 (Registry checkpoint per comment)
    - candles localhost port is 3001 (prod) or 3004 (staging) per comment
    - both read process.env.{REGISTRY_URL,CANDLES_URL} with fallback

  Exports:
    - ENDPOINTS, IS_CHECKPOINT, MODE all exported
    - IS_CHECKPOINT defined as MODE === "checkpoint"

Tests: 161 → 172 (api), 385 → 396 cross-repo. All green.
Pins the LRU eviction + re-registration logic in src/utils/warmer.js's
registerForWarming function. The /warmer endpoint shape is covered by
operational-endpoints.test.mjs but the eviction policy itself is not
tested. A refactor that silently changes the eviction order or breaks
re-registration would cause:

  - Re-registration treated as new entry → list bloat + churn
  - LRU broken → unbounded growth past WARMER_MAX_ENTRIES
  - registeredAt updated on re-register → retention windows reset

Coverage:
  Initial registration:
    - first call adds entry with params, lastSeen, registeredAt
    - lastSeen === registeredAt on first registration

  Re-registration of existing key:
    - updates lastSeen but NOT size
    - registeredAt is preserved (used for RETENTION_DAYS check)
    - params are NOT updated (intentional: stale call shouldn't
      overwrite params from initial registration)

  LRU eviction at maxEntries:
    - next registration after capacity evicts oldest by lastSeen
    - re-registering an old entry protects it from eviction
    - works at degenerate maxEntries=1
    - 20 rapid registrations to a max-5 warmer → exactly 5 entries

  Config defaults pinned via cache-config.js source regex:
    - WARMER_MAX_ENTRIES = 50
    - WARMER_RETENTION_DAYS = 7
    - ENABLE_WARMER defaults to "true" (enabled)

Tests: 172 → 183 (api), 412 → 423 cross-repo. All green.
Pins src/services/rate-provider.js — the ERC-4626 rate fetcher used
to convert sDAI rates into base prices throughout the api. Critical
constants where a typo silently returns wrong data:

  GET_RATE_SELECTOR  keccak256("getRate()")[:4] = 0x679aefce
                     Drift here → eth_call falls into the catch →
                     returns 1 (no-conversion fallback) silently
  CHAIN_CONFIG[100].defaultRateProvider
                     The canonical sDAI rate provider on Gnosis
                     (0x89C8...EceD). Typo → every sDAI conversion
                     uses 1.0 instead of the real rate
  18-decimal scaling Number(rateBigInt) / 1e18. Wrong divisor (1e6
                     for USDC) scales every rate by 1e12; TVL
                     dashboards explode

Coverage:
  GET_RATE_SELECTOR pinned exact value + still referenced in eth_call payload
  CHAIN_CONFIG[1] is Ethereum + null defaultRateProvider (pinned;
    adding an Ethereum default surfaces as deliberate change)
  CHAIN_CONFIG[100] is Gnosis + canonical sDAI address
  CACHE_DURATION = 5 * 60 * 1000 (5 min — sweet spot vs RPC load)
  All four error paths return 1 (the no-conversion fallback):
    - unknown chain
    - missing providerAddress
    - RPC error in result
    - thrown exception in catch
  18-decimal scaling literal pinned

Tests: 183 → 195 (api), 434 → 446 cross-repo. All green.
Pins src/services/spot-source.js — the toggle that routes spot price
fetches between the futarchy-spot service and CoinGecko/GeckoTerminal.

Coverage:
  Toggle:
    - USE_FUTARCHY_SPOT default is empty-string (== falsy → use Gecko)
    - .toLowerCase() applied so "TRUE"/"True" both work
    - FUTARCHY_SPOT_URL default is http://localhost:3032

  URL construction (futarchy-spot endpoint shape):
    - calls /api/v1/candles?ticker=...&minTimestamp=...&maxTimestamp=...
    - encodeURIComponent on ticker (ticker contains "+", "!", "/")

  Reliability:
    - 10s AbortSignal timeout
    - non-OK response falls back to fetchFromGecko
    - try/catch wraps the entire fetch with same fallback

  Default-window math:
    - minTimestamp = maxTs - (limit * 3600) [hours back, NOT days/min]
    - Math.max(0, minTimestamp) clamps to no-negative-unix
    - default limit = 500

  Surfaced bug (NOT fixed per directive — pinned for ratchet):
    src/services/spot-price.js has a hardcoded CoinGecko API key as
    `process.env.COINGECKO_API_KEY || '<KEY>'` fallback. Leaked key
    in source. Test pins existence (not value) so a removal surfaces
    as deliberate fix and any new addition surfaces too.

  Plus pinned DEFAULT_CONFIG ticker = 'PNK/WETH+!sDAI/WETH-hour-500-xdai'

Tests: 195 → 208 (api), 458 → 471 cross-repo. All green.
Pins src/services/spot-price.js's parseConfig — the parser that
decodes ticker config strings used throughout the spot-price chain.
Four formats supported:

  1. composite::POOL1+POOL2::RATE-interval-limit-network
  2. BASE/QUOTE+!OTHER/QUOTE-...                    (multi-hop, ! inverts)
  3. 0xPOOL[::RATE]-interval-limit-network          (direct address)
  4. BASE[::RATE]/QUOTE-interval-limit-network      (base/quote ticker)

Plus trailing -invert flag and URL auto-decoding.

Bug class this catches: a refactor that breaks the format
disambiguation order silently routes "PNK/WETH" through the wrong
branch, returning bad data with no obvious symptom.

Coverage:
  Falsy input → null

  Format 1 (composite):
    - two pools + rate provider
    - ! invert prefix on a hop
    - missing rate provider

  Format 2 (multi-hop):
    - "A/B+C/D" parses two hops with base/quote split
    - "!" prefix inverts hop, NOT included in base symbol

  Format 3 (pool address):
    - bare 0x pool, no rate
    - "0xPOOL::0xRATE" extracts both
    - case-insensitive 0x prefix detection (0X works)

  Format 4 (base/quote):
    - simple "GNO/sDAI"
    - "BASE::RATE/QUOTE" extracts rate from base side

  -invert flag:
    - case-insensitive (INVERT, Invert, invert)
    - stripped from parts before indexing
    - default invert=false

  Defaults:
    - interval="hour", limit=500, network="xdai"
    - partial parts use defaults for missing slots

  URL decoding:
    - auto-decodes when % present
    - skips decode when no % (perf shortcut pinned)

  Format disambiguation order:
    - composite:: takes priority over + and 0x checks
    - multi-hop + takes priority over pool-address branch

Tests: 208 → 228 (api), 492 → 512 cross-repo. All green.
Pins src/adapters/registry-adapter.js — the on-chain registry lookup
that powers resolveProposalId (normalizes arbitrary proposal IDs to
canonical addresses) and lookupOrgMetadata.

Coverage:
  normalizeProposalResult (pure shape-normalizer):
    - proposalId + proposalAddress are lowercased
    - originalProposalId preserves the input case (display in checksummed
      form back to user)
    - empty/null proposal yields all-undefined-or-null shape (no throw)
    - organization fields extracted (id + name)
    - 6 parseInt config fields (closeTimestamp, startCandleUnix,
      twapStartTimestamp, twapDurationHours, chain, pricePrecision):
      - valid string → integer
      - missing → null (NOT 0, NOT NaN)
      - empty string → null (the truthy-check; without it parseInt("")
        yields NaN throughout downstream)
    - 4 string fields with || null fallback (coingeckoTicker etc.)

  Pinned canonical addresses (all four constants):
    - AGGREGATOR_ADDRESS = 0xc5eb43...4fc1 (case-insensitive match
      with futarchy-fi/interface DEFAULT_AGGREGATOR — cross-pinned)
    - SNAPSHOT_LINK_REGISTRY = 0xa6Bc28...0823
    - FACTORY_ADDRESS = 0xa6cB18...0a345
    - GNOSIS_RPC default = https://rpc.gnosischain.com

Tests: 228 → 248 (api), 530 → 550 cross-repo. All green.
Pins src/services/spot-price.js's combineHopCandles + NETWORK_MAP +
GECKO endpoint selection logic.

combineHopCandles is the core multi-hop price multiplier — given
candles for each hop in a multi-hop ticker (e.g. PNK/WETH × WETH/sDAI),
produces a single composite series by collecting all unique timestamps,
forward-filling missing prices per hop, and multiplying once ALL hops
have at least one known price. A regression here silently corrupts
every multi-hop spot price.

Coverage:
  combineHopCandles:
    - empty array → empty
    - single-hop → identity (returns SAME array, not copy)
    - two hops same timestamps multiply per-timestamp
    - three hops multiply all together
    - missing timestamp on one hop forward-fills from previous
    - skips timestamps before ALL hops are initialized (warmup)
    - output sorted by time ascending
    - float precision preserved through multiplication

  NETWORK_MAP:
    - xdai alias (chainId 100, gecko "xdai")
    - gnosis alias (synonym for xdai — both must route to chain 100)
    - eth alias (chainId 1)
    - base alias (chainId 8453)
    - all RPC URLs are HTTPS

  GECKO endpoint selection (key-conditional URL + headers):
    - GECKO_API switches to pro-api.coingecko.com when key set
    - Falls back to api.geckoterminal.com (public)
    - GECKO_HEADERS adds 'x-cg-pro-api-key' when key set (pro-api requires it)
    - Public-headers branch is just {accept} — defensive against leaking
      pro key to public terminal endpoint

Tests: 248 → 265 (api), 568 → 585 cross-repo. All green.
iteration 29 (api side). New: auto-qa/tests/algebra-client.test.mjs
(17 cases).

Pins src/services/algebra-client.js — the LEGACY Graph Node-shaped
client still imported by unified-chart.js + market-events.js as the
non-Checkpoint fallback path. Five concerns:

  1. ALGEBRA_ENDPOINT === ENDPOINTS.candles (env-driven, not hardcoded
     URL). Plus a defensive scan asserting NO http(s):// strings live
     outside comments.
  2. fetchPoolsForProposal uses GraphQL VARIABLE binding ($proposalId:
     String!) NOT inline interpolation — protects against query
     injection. Variable type is String! (not BigInt!) — Graph Node
     shape, not Checkpoint.
  3. getLatestPrice period hardcoded to "3600" (1-hour candles) on
     BOTH ternary branches. Drift would silently change chart
     sampling rate.
  4. getLatestPrice maxTimestamp param defaults to null (not undefined,
     not 0); when null, _lte filter is omitted; when set, included.
     A 0 default would query "everything <= 0" → zero rows.
  5. Default-zero behavior: returns 0 (not null/NaN) when no candle
     found — pinned because callers expect numeric. parseFloat (not
     parseInt) on candle.close.

Plus pins for the orderBy/orderDirection invariant (must be
periodStartUnix DESC + first 1 to get LATEST, not earliest), the
Graph-Node-only nested selections (token0/token1/proposal), the
GraphQL error-throw guards on both functions, and the module
docstring's pointer to candles-adapter.js for mode-aware code.

Tests: 265 -> 282 (api). All 17 new cases passing.
iteration 30 (api side). New: auto-qa/tests/graphql-passthrough-factory.test.mjs
(20 cases).

First UNIT-level coverage for src/routes/graphql-passthrough.js — the
generic GraphQL passthrough factory wired into /registry/graphql and
/candles/graphql by src/index.js. Existing passthrough-smoke and
passthrough-contract tests exercise the live HTTP endpoint; this file
locks the factory's internal branching with mock req/res + a fetch
stub (no live network).

Branches pinned:
  - Factory shape — returns an async (req, res) handler.
  - 503 branch — getUpstreamUrl() returning null/undefined/"" emits
    `{ errors: [{ message: '[label] upstream URL not configured' }] }`
    AND short-circuits BEFORE calling fetch (prevents accidental
    fetch("") on env-driven misconfig).
  - Happy path — POST + Content-Type: application/json + AbortSignal,
    upstream status code forwarded (probed at 200/201/400/502),
    upstream content-type forwarded with 'application/json' fallback,
    body forwarded VERBATIM (no JSON.parse re-stringify).
  - req.body fallback — undefined → "{}", null → "{}" (the ?? operator
    in `req.body ?? {}`). Catches a regression where the body becomes
    the literal string "undefined".
  - Error branches — AbortError → 504 ("[label] upstream timeout
    after Nms"); other Error → 502 ("[label] upstream error: <msg>");
    error without .message → "unknown" (defensive default).

Plus source-text pins: DEFAULT_TIMEOUT_MS = 15_000 (15s),
AbortController + signal wiring, clearTimeout in `finally` (timer
leak guard under high traffic), and the [${label}] log prefix
invariant for ops triage.

Tests: 282 -> 302 (api). All 20 new cases passing.
krandder added 30 commits May 10, 2026 17:42
…BoundedByDirect

First true cross-layer count check for the unified-chart endpoint.
apiUnifiedChartShape only validated SHAPE; this asserts the inter-
layer relationship between api yes+no candles and direct indexer
total. Since api filters by proposal pools, api ⊆ direct, so api
total ≤ direct total. Catches api filter regression (returns ALL
instead of pool-filtered subset) and transform fabrication.

Cross-layer match family now spans 3 patterns: passthrough match
(apiCandlesMatchesDirect), multi-entity passthrough match
(apiRegistryMatchesDirect), and filtered subset (NEW).

Test fix: previously-passing apiUnifiedChartShape populated 3 candles
but default direct had 1; bumped its candlesCandlesCount to 3 to
keep it happy under the new invariant.

36 invariants total. 109/109 smoke tests pass (was 105). Bridges to
the documented full chartShape invariant — that future iteration
extends count to ID-by-ID pair-wise compare.
…dAbove

Magnitude-upper-bound for swap amounts. Closes the swap-side gap in
the magnitude-sanity family: candle side already had probabilityBounds
+ candlePricesNonNegative; swap side only had > 0 + range checks.

Asserts amountIn AND amountOut < 1e15 — catches raw uint256 leaks
(parseFloat returning 1e18 instead of decimal "1.0") and token-decimal
misalignment that scales values by 1e6x.

Distinct from swapAmountsPositive which only checks sign; raw-int
leak passes that check (1e18 > 0) but fails this one.

37 invariants total. 113/113 smoke tests pass (was 109). Magnitude-
sanity family now SYMMETRIC across candle and swap sides — each has
lower-bound + upper-bound coverage.
First indexer-side enum validation. For all pools (first 50), asserts
type ∈ {CONDITIONAL, PREDICTION, EXPECTED_VALUE} (the set sourced from
unified-chart.js's findPoolByOutcome). Catches:
- Schema migration that adds a 4th type without updating consumers
- Indexer regression returning null type
- Typo'd type values like "PRDICTION"

Distinct from probabilityBounds which treats non-PREDICTION as vacuous
— so a typo'd type silently slips through every existing check while
the api adapter silently drops the pool. New pattern: iterate-all-rows
enum check (vs latest-row or count-only).

38 invariants total. 118/118 smoke tests pass (was 113). Pool-entity
coverage now spans existence + FK + per-pool field validation.
…hyProdAggregator

High-value PINNING check at the registry layer. Asserts the indexer
has the production futarchy aggregator (0xc5eb43d5…d4fc1, hardcoded
in 3 api source files: registry-adapter.js, unified-chart.js,
market-events.js — the api literally cannot function without this
aggregator's data).

Registry-side analog of anvilChainId: chain pin proves we forked
Gnosis; this pin proves the indexer was bootstrapped with the right
chain + start_block + contract config.

Distinct from registryHasAggregators (existence): a wrong-block
bootstrap might produce ghost aggregators, passing the existence
check but missing the prod one entirely. Test 4 verifies this gap
explicitly.

Fixture: new includeFutarchyProdAggregator knob (default true)
appends prod address. 2 existing tests updated to set knob=false
where they assert exact aggregator counts.

39 invariants total. 121/121 smoke tests pass (was 118). Hardcoded-
address pinning now symmetric across chain (anvilChainId) + registry
(this slice).
…sObservabilityHeaders

40-invariant milestone. First response-HEADER validation in the
catalog — every prior api invariant probed status code or body
shape. This asserts X-Cache ∈ {HIT, MISS} AND X-Response-Time
matches /^\d+ms$/.

The unified-chart handler emits these on every code path (cached
HIT + fresh MISS); ops dashboards consume them. A regression that
drops them is invisible to body checks. Test 3 verifies the gap:
drop X-Cache header → apiUnifiedChartShape STILL passes since body
shape unchanged; only this header probe catches it.

Catches: removal of cache layer instrumentation; addition of third
state ('STALE') without telling ops; format regressions emitting
'NaN ms' or raw integer.

Fixture: chart handler now emits headers unconditionally; new
unifiedChartXCache / unifiedChartXResponseTime knobs.

40 invariants total. 126/126 smoke tests pass (was 121).
…nMentionsAnvil

Chain-CLIENT identity pin. Distinct from anvilChainId (chain-NETWORK
pin). Calls web3_clientVersion; asserts response contains "anvil".
Together they pin both layers of "right environment" — chain ID for
the network, client version for the EVM impl.

Catches running against a Gnosis fork on geth/erigon where chain ID
matches but anvil_/evm_ extensions for impersonation, snapshots, and
time-warping would silently fail in scenario tests.

41 invariants total. 130/130 smoke tests pass.
…n api side

First staged CI workflow on the api side. Job runs the orchestrator's
130+ smoke-test invariant battery against an in-process node:http
fixture (no docker, no real services, ~1.5s test time + Node setup).
Trigger is workflow_dispatch only for this first version, matching
the conservative roll-out of slices 3a + 3c.

Also added auto-qa/harness/ci/README.md mirroring interface-side
pattern (staging dance explanation + currently-staged table +
promotion command).

Cheapest of the 4 currently-staged CI workflows to promote
(no docker, no Playwright, no GH Actions secrets); recommended
first promotion target for the maintainer.
…bsetOfDirect (42nd invariant)

First cross-layer per-row TIME-PAIR check for the unified-chart
endpoint. Strengthens chartCandleCountsBoundedByDirect (count
bound) into per-row time-membership: every candle time the api
surfaces must appear in the direct candles indexer's time set,
otherwise the api is fabricating data (or mixing another
proposal's periods).

Uses `time` not `id` because applyRateToCandles reshapes raw
indexer candles and doesn't expose IDs. Catches bug classes
the count bound MISSES: transform synthesizing period-start
timestamps, cache key mismatch returning wrong proposal's
candles, time-bucket off-by-one, SPOT bleeding into yes/no.

42 invariants now: 11 api-internal + 26 indexer + 5 chain.
134/134 smoke tests pass (4 new + 2 existing tests aligned to
DESCENDING candleTimes so candleTimeMonotonic stays happy).
…ent (43rd invariant)

Sixth chain-layer invariant. Probes the FEE-MARKET state, which
can be independently broken from chain identity / block shape.
Asserts eth_gasPrice returns a 0x-prefixed positive hex value.

Three named failure modes (each with its own diagnostic):
  - null    → EIP-1559-only mode (legacy gas pricing disabled)
  - 0x0     → broken fee market (anvil --gas-price 0 misconfig)
  - non-hex → RPC-layer regression (BigInt parsing breaks)

Why this matters for scenarios: most futarchy flows submit
transactions (impersonateAccount + send) which need a working
gas price for estimation. Without this probe, a scenario reports
"transaction failed at step N" with no breadcrumb pointing to
the fee-market issue.

43 invariants now: 11 api-internal + 26 indexer + 6 chain.
139/139 smoke tests pass (5 new). 1 fixture knob added
(gasPriceHex), 1 RPC dispatch case added (eth_gasPrice).
…acheTtlPresent (44th invariant)

Second response-HEADER probe in the catalog. Sister to
apiUnifiedChartHasObservabilityHeaders (X-Cache +
X-Response-Time); this one covers X-Cache-TTL. Split into a
separate invariant for single-responsibility per probe — ops
dashboards filter on TTL independently of hit/miss.

Scope correction: the original X-Cache+X-Response-Time invariant's
comment said TTL was HIT-only. Inspection of unified-chart.js
shows it's set on BOTH paths (line 74 HIT + line 278 MISS), so
this asserts unconditionally rather than as a conditional check.
The old comment was updated in the same commit.

Format asserted: positive integer string, no unit suffix. Catches
refactor dropping TTL from one path but not the other (sister
probe STILL passes — demonstrates per-header-split value), 'NaN'
/'-1' from timing/env-var bugs, accidental unit suffix ('300s'
silently wrong: parseInt returns 300 by coincidence), header
dropped entirely.

44 invariants now: 12 api-internal + 26 indexer + 6 chain.
144/144 smoke tests pass (5 new). 1 fixture knob added
(unifiedChartXCacheTtl).
…onMatchesChainId (45th invariant)

Seventh chain-layer invariant. Chain-RPC-CONSISTENCY check —
asserts net_version (decimal) and eth_chainId (hex) numerically
agree. Both methods should report the same chain ID by spec
(net_version is legacy; eth_chainId is the EIP-695 modern method).
Divergence silently breaks consumers that pick one or the other.

Orthogonal to anvilChainId: that asserts eth_chainId === 0x64 (the
EXPECTED Gnosis value); this asserts net_version === eth_chainId
(CONSISTENCY regardless of WHAT they equal). Demonstrated by the
bare-anvil-31337 test: both methods report 31337, this passes
(consistency intact), anvilChainId fails (wrong network).

Bug shapes caught (NOT caught by anvilChainId alone):
  - Fork rebase updates one method but not the other
  - Reverse-proxy misconfig routes them to different upstreams
  - Mock fixture hardcodes one but not the other
  - Anvil version regression where one method reads from a
    stale cached config and the other from live state

45 invariants now: 12 api-internal + 26 indexer + 7 chain.
149/149 smoke tests pass (5 new). 1 fixture knob added
(netVersion), 1 RPC dispatch case added (net_version).
…nCapabilityPresent (46th invariant)

Eighth chain-layer invariant; first to exercise an ANVIL-SPECIFIC
RPC method (anvil_impersonateAccount) rather than standard JSON-RPC.
Asserts the method is actually callable, not just that the client
*claims* to be anvil — distinct domain from
anvilClientVersionMentionsAnvil.

Several "hardhat-compatible" forks and patched-anvil builds exist
that emit "anvil" in web3_clientVersion but lack the impersonation
extension scenarios depend on. With anvilImpersonationSupported:
false, the capability probe FAILS while the identity probe STILL
passes — proving the two checks are orthogonal.

Why this matters for scenarios: every futarchy flow that mutates
state requires impersonating an account (proposer, trader,
resolver). Without this method, EVERY scenario silently fails
to produce state changes.

46 invariants now: 12 api-internal + 26 indexer + 8 chain.
152/152 smoke tests pass (4 new). 1 fixture knob added
(anvilImpersonationSupported: true | false | 'rpc-error'),
1 RPC dispatch case added (anvil_impersonateAccount).
…bilityPresent (47th invariant)

Ninth chain-layer invariant; second chain-CAPABILITY probe.
Sister to anvilImpersonationCapabilityPresent. Together they form
the MINIMAL CAPABILITY SET scenarios depend on:
  - impersonate → call function as arbitrary account
  - snapshot/revert → roll back state between tests

Distinct domain from impersonation: evm_snapshot is part of the
GANACHE LINEAGE (anvil + hardhat both support it; geth/erigon/reth
don't). Failure modes are complementary:
  - anvil_* missing → wrong dev client (hardhat instead of anvil)
  - evm_*    missing → real client (geth/erigon/reth)
  - both ok → minimal scenario capability satisfied

Also catches subsystem-broken case: method registered but returns
null/non-hex (calling evm_revert with that silently fails). The
non-hex check distinguishes "registered but broken" from "not
registered at all" — different diagnostic paths.

47 invariants now: 12 api-internal + 26 indexer + 9 chain.
156/156 smoke tests pass (4 new). 1 fixture knob added
(snapshotResult: '0x1' | false | null | 'rpc-error'),
1 RPC dispatch case added (evm_snapshot).
…sPositive (48th invariant)

First iterate-all-rows extension on the swap side. Strengthens
swapAmountsPositive (latest-only) into a per-row check across the
first 50 swaps. Mirrors the poolTypeIsValidEnum pattern (iterate-
all-rows enum check at the indexer layer).

Why both invariants exist:
  - swapAmountsPositive (LATEST only) — cheap probe; catches
    event-decoder bugs uniform across ALL swaps
  - THIS one (UP-TO-50 rows) — catches bugs that affect SUBSETS
    of swaps without affecting the latest

Bug shapes caught (NOT caught by latest-only):
  - Indexer reorg re-processed historical blocks; latest fine,
    old rows wrong
  - Block-context-dependent decoder bug — reads "decimals" from
    pool's CURRENT state instead of swap's block, corrupting
    historical swaps from before a decimals change
  - Partial-rewrite bug — fix re-emitted only swaps from a
    specific block range with the corrupted shape
  - Pool-specific decoder bug — only swaps for one pool affected;
    latest happens to be a different pool

Fixture extension: buildSwaps now defaults amountIn/amountOut to
'1.0' for non-zero indices (index 0 still uses latestSwap* for
back-compat). New per-row override knobs: swapAmountIns,
swapAmountOuts arrays.

48 invariants now: 12 api-internal + 27 indexer + 9 chain.
160/160 smoke tests pass (4 new).
…e (49th invariant)

First body-shape probe on /health. STRENGTHENS the existing
apiHealth (status-code-only) into a body validation. Production
/health (src/index.js line 54) emits { status: 'ok', timestamp:
<ISO 8601> } — both fields matter to downstream ops.

Why both invariants exist:
  - apiHealth (status-code-only) — catches "endpoint dead" outright
  - THIS one (body shape) — catches refactors that keep the
    endpoint serving 200 but change its body shape, silently
    breaking downstream consumers

Bug shapes caught (NOT caught by apiHealth):
  - Refactor returns just the string 'ok' (not JSON body)
  - status field renamed to 'state'
  - status value changed ('healthy' instead of 'ok') — LB string-
    match health checks silently fail
  - timestamp dropped — ops dashboards parsing 'last-fresh' age
    silently break
  - timestamp emitted as Unix epoch number instead of ISO 8601
    string — every ISO parser breaks
  - timestamp is a string but malformed

ISO 8601 validation strategy: Date.parse() rather than a regex —
robust enough to accept the canonical new Date().toISOString()
format and reject typical malformed inputs.

Fixture /health handler now defaults to production shape (was
{ ok: true }, now { status: 'ok', timestamp: <ISO> }). New knobs:
healthStatus, healthTimestamp, healthBody (full-body override).

49 invariants now: 13 api-internal + 27 indexer + 9 chain.
164/164 smoke tests pass (4 new).
…bilityPresent (50th invariant) 🎯

50-invariant milestone. Tenth chain-layer invariant; third chain-
CAPABILITY probe. COMPLETES the minimal capability TRIO that
scenarios depend on:
  1. impersonate → call function as arbitrary account
  2. snapshot/revert → roll back state between tests
  3. TIME-WARP → simulate "wait N seconds/days" (this slice)

Without time-warp, ANY scenario involving a time-gated state
transition (resolution after deadline, TWAP window calculation,
vote-weight decay) cannot run at all — wall-clock waits would
make CI runs hours-long.

evm_setNextBlockTimestamp lineage: ganache-original method,
supported by anvil + hardhat + ganache. Same support profile as
evm_snapshot — wrong-fork clients (geth/erigon/reth) lack it.

Bug shapes caught (NOT caught by impersonate / snapshot probes):
  - Anvil flag --no-storage-caching disabling time-warp specifically
    (snapshot can work while timestamp manipulation is broken)
  - RPC method-allowlisting blocking evm_setNextBlockTimestamp
    while allowing evm_snapshot/revert
  - Anvil version regression with new signature dropping legacy
    alias

Side effect: probe sets next-block timestamp to now+86400. No
block mined in the probe; effect only manifests if a scenario
subsequently mines, which can override.

Includes a "trio milestone test" that explicitly verifies all
three capability probes pass on default fixture, documenting
the trio as a coherent set.

50 invariants now: 13 api-internal + 27 indexer + 10 chain.
168/168 smoke tests pass (4 new). 1 fixture knob added
(timeWarpSupported), 1 RPC dispatch case added
(evm_setNextBlockTimestamp).
…e (51st invariant)

Second body-shape probe in the catalog. Sister to apiHealthBodyShape
(just shipped) — both extend a status-code-only invariant with
body-shape validation. Together they cover the two main
observability endpoints (/health + /warmer).

Production /warmer (src/utils/warmer.js getWarmerStatus()) emits:
  { active, maxEntries, refreshIntervalSec, retentionDays, entries[] }

All four numeric fields must be finite numbers. `active` may be 0
(warmer might have no entries yet); the three config fields must
be > 0 (0 means "disabled" — a config regression). `entries`
must be an array.

Bug shapes caught (NOT caught by apiWarmer):
  - Refactor renames any of the four numeric fields (e.g.,
    active → activeCount) — silent until ops gauges break
  - Numeric field emitted as string ('5' instead of 5) —
    consumers using strict typeof checks break
  - entries field changed to non-array (e.g., object keyed by id)
    — consumers iterating with .map() crash with "is not a function"
  - Body wrapped in a `data` field by middleware refactor
  - Config sentinel hit (refreshIntervalSec=0 = warmer disabled
    — silent regression that breaks the entire warming subsystem)

Fixture /warmer handler now defaults to production shape (was
{ status: 'warm', queues: 0 }). New knobs: warmerActive,
warmerMaxEntries, warmerRefreshIntervalSec, warmerRetentionDays,
warmerEntries, warmerBody (full-body override).

51 invariants now: 14 api-internal + 27 indexer + 10 chain.
172/172 smoke tests pass (4 new).
…emaHasRequiredTypes (52nd invariant)

First GraphQL INTROSPECTION probe in the catalog — qualitatively
new dimension. All previous indexer probes query DATA (pools,
swaps, candles); this queries the SCHEMA (__schema { types { name } })
to verify the entity types themselves still exist.

The bug class this catches: schema regeneration renames a type
(Candle → OHLCBar) or drops one entirely. Data probes hitting the
renamed/dropped type return GraphQL errors like "Cannot query
field 'candles'" — surfacing as misleading "indexer empty"
diagnostics. This invariant catches the rename DIRECTLY with a
clear "schema is missing required type(s): Candle" message,
making triage take seconds instead of minutes.

Bug shapes caught (NOT caught by data probes):
  - Schema regeneration renamed Pool → LiquidityPool / Candle
    → OHLCBar
  - A required type was DROPPED entirely from the schema
  - Schema introspection itself was disabled (some production
    GraphQL servers disable it for security)

Required types asserted: Pool, Swap, Candle (the three entities
the harness actually queries). Doesn't hard-pin every type —
that would over-couple to the schema; pins ONLY the harness's
actual dependencies.

Fixture extension: candles-direct response now includes
__schema: { types: [...] } by default. Knob candlesSchemaTypes
lets tests override; setting it to null omits __schema entirely
(simulates introspection-disabled servers).

Indexer-side coverage now spans THREE qualitatively distinct
dimensions: connectivity, data, and SCHEMA. Together they
distinguish "indexer down" vs "indexer empty" vs "indexer
schema regression" — three failure modes that previously all
collapsed into "indexer failing somehow".

52 invariants now: 14 api-internal + 28 indexer + 10 chain.
176/176 smoke tests pass (4 new).
…hemaHasRequiredTypes (53rd invariant)

Second GraphQL INTROSPECTION probe — sister to
candlesIndexerSchemaHasRequiredTypes (just shipped), on the
registry indexer. Symmetrically completes schema-validation
coverage across BOTH indexers.

Why a separate registry probe (not one combined invariant):
  - Registry and candles are SEPARATE Checkpoint deployments —
    they can be regenerated/migrated independently
  - Failure diagnostics stay precise: which indexer's schema
    regressed, not "one of them did"
  - Different required type names (registry = ProposalEntity/
    Organization/Aggregator; candles = Pool/Swap/Candle)

Required types: ProposalEntity, Organization, Aggregator. The
three load-bearing registry entities — each referenced by other
invariants (FK probes, aggregator pinning probe, registry
adapter probes).

Bug shapes caught (NOT caught by data probes):
  - Schema regen renamed ProposalEntity → Proposal (data probes
    return "Cannot query field 'proposalEntities'" — looks like
    indexer-empty)
  - Aggregator dropped entirely → aggregator-pinning probes
    silently fail with misleading errors
  - Introspection disabled on registry side (independent of
    candles side; sister probe still passes — demonstrates
    per-indexer diagnostic precision)

Fixture extension: registry-direct response now includes
__schema: { types: [...] } by default. Knob registrySchemaTypes
lets tests override; null omits __schema entirely.

Both indexers now have FULL coverage across THREE qualitative
dimensions: connectivity, data, SCHEMA. Failure diagnostics
triage to ONE of three modes (down / empty / schema-regressed)
per indexer.

53 invariants now: 14 api-internal + 29 indexer + 10 chain.
180/180 smoke tests pass (4 new).
…owsNonNegative (54th invariant)

Iterate-all-rows extension on the candle side. Sister to
swapAmountsAllRowsPositive — symmetrically completes the iterate-
all-rows pattern across the two main accumulator-bearing entities
(swap amounts + candle volumes).

Why both candleVolumesNonNegative AND this exist:
  - candleVolumesNonNegative (LATEST only) — cheap probe; catches
    aggregator bugs uniform across all candles
  - THIS one (UP-TO-50 rows) — catches bugs affecting SUBSETS of
    candles without affecting the latest

Bug shapes caught (NOT caught by latest-only):
  - Indexer reorg re-processed historical periods; latest fine,
    old candles corrupted
  - Per-period decoder bug (aggregator reads pool token-decimals
    from CURRENT state instead of period snapshot, corrupting
    historical candles from before a decimals change)
  - Partial-rewrite bug — fix re-emitted only candles from a
    specific period range
  - Pool-specific aggregator bug — only candles for one pool
    affected; latest happens to be a different pool

Fixture extension: buildCandles now defaults volumeToken0/1 to
'1.0' for non-zero indices (index 0 still uses latestCandleVolume*
for back-compat). New per-row override knobs: candleVolumeToken0s,
candleVolumeToken1s arrays.

The iterate-all-rows pattern is now SYMMETRIC: swap amounts AND
candle volumes both have latest-only + iterate-all-rows coverage.
Each pattern catches a distinct bug class (uniform-aggregator vs
subset-corruption).

54 invariants now: 14 api-internal + 30 indexer + 10 chain.
184/184 smoke tests pass (4 new).
…Consistent (55th invariant)

Third iterate-all-rows extension. COMPLETES the iterate-all-rows
TRIAD on the indexer's main accumulator entities:
  1. swapAmountsAllRowsPositive (8 slices ago)
  2. candleVolumesAllRowsNonNegative (last slice)
  3. candleOHLCAllRowsConsistent (this slice)

Each accumulator entity now has BOTH latest-only + all-rows
coverage:
  | Entity | Latest-only | All-rows |
  |--------|-------------|----------|
  | swap.amount{In,Out} | swapAmountsPositive | swapAmountsAllRowsPositive |
  | candle.volumeToken{0,1} | candleVolumesNonNegative | candleVolumesAllRowsNonNegative |
  | candle.{open,high,low,close} | candleOHLCOrdering | candleOHLCAllRowsConsistent (NEW) |

Each pair catches uniform-aggregator bugs (latest) AND subset-
corruption bugs (all-rows).

Bug shapes caught (NOT caught by latest-only):
  - Per-period min/max accumulator bug — historical candles
    initialized differently (running-min reset to 0 instead of
    +Infinity for periods where the first swap > 0)
  - Indexer reorg corrupted historical OHLC fields
  - Pool-specific aggregator bug — only candles for one pool
    affected
  - Period-boundary off-by-one — a swap counted in the wrong
    period had a price outside the window's bounds

Fixture extension: buildCandles now defaults OHLC to consistent
values on every row (open=close=0.5, high=0.6, low=0.4) for non-
zero indices. Index 0 still uses latestCandle* for back-compat.
New per-row override knobs: candleOpens, candleHighs, candleLows,
candleCloses arrays.

55 invariants now: 14 api-internal + 31 indexer + 10 chain.
188/188 smoke tests pass (4 new).
…er ergonomics script

Tooling slice (no new invariant). At 55 invariants the dry-run
flat catalog is hard to scan. Adds `npm run scenarios:by-layer`
that prints invariants grouped by layer, with both a summary
table (layer + count + bar-chart) AND per-layer detail blocks.

What it answers at a glance:
  - "What does the chain layer cover?" → orchestrator↔chain block
    lists all 10 chain probes
  - "Which probes cross to the candles indexer?" → api↔candles
    block lists 4 names
  - "Where's the catalog growing fastest?" → bar-chart shows
    orchestrator↔candles is densest at 21

Authoritative layer breakdown surfaced (corrects an earlier
inconsistency in status-line bucketing):
  api                    10
  api↔candles             4
  api↔registry            2
  orchestrator↔candles   21
  orchestrator↔chain     10
  orchestrator↔registry   8
                       ----
                         55

Implementation: 35-line scripts/scenarios-by-layer.mjs that
imports INVARIANTS, groups by `layer` field, prints text. No
flags, no colors, deliberately scriptable (pipe into grep/awk
for filtering). Same import style as the existing dry-run output
but reorganized.

Smoke test: 1 new (asserts header line, summary table format,
per-layer detail sections, sanity-check that the chain-CAPABILITY
trio names appear under the chain-layer block).

189/189 smoke tests pass (was 188).
…lForwardsIntrospection (56th invariant)

First api-layer introspection-passthrough probe. Sister to
registryIndexerSchemaHasRequiredTypes (DIRECT-side). Same __schema
query, but routed through the API LAYER instead of direct.

The bug class: many production GraphQL proxies (Apollo Gateway,
Hasura, etc.) ship with introspection disabled by default at the
proxy layer for security — even when the upstream indexer
supports it. If a deploy accidentally turns on that toggle,
harness scenarios that introspect through the api layer silently
break, BUT the DIRECT-side sister still passes — making the
actual cause hard to find without this distinct probe.

Diagnostic-precision pattern (api+direct cross-check):
  api ✓ direct ✓ → both layers fine
  api ✗ direct ✓ → API PROXY STRIPPED INTROSPECTION
  api ✓ direct ✗ → indexer schema regressed (api correctly
                   forwarded the broken schema)
  api ✗ direct ✗ → indexer is root cause

Each combination has a distinct error message so engineers can
read the combined signal without guessing which layer broke.

Fixture extension: api /registry/graphql handler now includes
__schema in its passthrough by default (mirroring direct). New
knob apiRegistryStripsIntrospection (default false) simulates
proxy-layer disablement.

The api↔registry layer (previously thinnest at 2 invariants) is
now at 3.

56 invariants now: 10 api + 4 api↔candles + 3 api↔registry +
21 orchestrator↔candles + 8 orchestrator↔registry + 10
orchestrator↔chain. 193/193 smoke tests pass (4 new).
…ForwardsIntrospection (57th invariant)

Sister to apiRegistryGraphqlForwardsIntrospection (just shipped)
on the candles side. COMPLETES the introspection-coverage MATRIX:

                | Direct-side | API-side |
  --------------+-------------+----------+
  registry      |      ✓      |    ✓     |
  candles       |      ✓      |    ✓     |  ← this slice

All four probes are now in the catalog. For ANY introspection
failure, the diagnostic combines layer (api/direct) × indexer
(registry/candles) into a precise root-cause statement.

Bug class beyond the registry sister: per-route proxy config drift.
The candles route can be misconfigured INDEPENDENTLY of the registry
route (separate proxy configs are common in production GraphQL
gateways). Pairing the two api-layer probes catches that drift:
  apiRegistry ✓ apiCandles ✓ → proxy fine on both routes
  apiRegistry ✗ apiCandles ✓ → registry route stripped only
  apiRegistry ✓ apiCandles ✗ → candles route stripped only (drift)
  apiRegistry ✗ apiCandles ✗ → proxy-wide lockdown

Fixture extension: api /candles/graphql handler now includes
__schema in its passthrough by default (mirroring direct +
registry). New knob apiCandlesStripsIntrospection (separate
from the registry knob so per-route drift is testable).

57 invariants now: 10 api + 5 api↔candles + 3 api↔registry +
21 orchestrator↔candles + 8 orchestrator↔registry + 10
orchestrator↔chain. 197/197 smoke tests pass (4 new).
Sister to interface-side scenarios-catalog. Ships a
committed Markdown index of the orchestrator's 57
invariants — browsable on GitHub without running
anything — plus drift detection against doc rot.

What ships:
  - scripts/invariants-catalog.mjs (~85 lines): imports
    INVARIANTS, validates {name, description, layer},
    groups by layer, emits orchestrator/INVARIANTS.md
  - orchestrator/INVARIANTS.md (~110 lines, committed):
    title + per-layer summary + per-layer detail tables
  - npm script invariants:catalog wired
  - tests/smoke-invariants-catalog.test.mjs: drift smoke
    test mirroring interface-side smoke-scenarios-catalog
    (snapshot → regen → byte-identical assertion → restore
    in finally; "added an invariant but forgot to regen"
    becomes a CI failure with a fix-command pointer)

Why now: with 57 invariants across 6 layers (api,
api↔candles, api↔registry, orchestrator↔candles,
orchestrator↔chain, orchestrator↔registry), coverage-
by-script is insufficient. Reviewers can now read the
catalog on GitHub without cloning. Future tooling (CI
dashboards, coverage gap reports) gets a readable
machine-friendly source.

Validation: 1/1 new smoke test passes in isolation.
Full suite: pre-existing daemon-dependent flake on
Phase 2 slice 4 port-leak test, no regressions caused
by this slice.
Two new root-level aliases so the catalog scripts shipped
in recent slices are invocable from the repo root, not just
from auto-qa/harness/:

  - auto-qa:e2e:scenarios:by-layer (slice 4d-by-layer-script)
  - auto-qa:e2e:invariants:catalog (slice 4d-invariants-catalog)

Pure-additive package.json change. Each alias verified to
resolve from the repo root and produce expected output.

Pairs with an analogous interface-side commit wiring
auto-qa:e2e:scenarios:by-route at that repo's root.
…aged CI

The staged api smoke workflow (3e) currently runs only
smoke:scenarios + a dry-run catalog sanity check — it
does NOT cover the new smoke-invariants-catalog.test.mjs
shipped two iterations ago.

Extends auto-qa-harness-smoke.yml.staged with two new
steps mirroring the interface-side scenarios:catalog
drift pattern (slice 3a):

  - Regenerate invariants catalog (npm run invariants:catalog)
  - Verify invariants catalog is in sync (git diff --exit-code)

Without this, an invariant added without regenerating
INVARIANTS.md would silently drift in CI.

Validation: YAML re-parsed clean via js-yaml@4; drift
assertion pre-verified locally (regen + git diff exits 0).
Trigger remains workflow_dispatch.
…link smoke

Phase 0 CHECKLIST item 41 ("Sister-link verified: fresh
checkout of both repos in ~/code/futarchy-fi/") bundles
doc + docker checks. This slice ships the doc-side half.

Adds tests/smoke-architecture-sync.test.mjs that:
  - Resolves sister ARCHITECTURE.md at
    ../interface/auto-qa/harness/ARCHITECTURE.md (4
    levels up from the test file)
  - Skips cleanly via t.skip() if sister not present
    (CI runners, one-repo clones)
  - Asserts byte-identical otherwise — fails loudly
    if the shared spec drifted between repos

Sister test on the interface side mirrors this in reverse
(looks up futarchy-api-side ARCHITECTURE.md). Baseline
verified byte-identical; both tests pass 1/1 in isolation.

CHECKLIST item 41 gains a sub-bullet recording the doc-
side coverage; docker-side half remains unchecked
(daemon-required).
…workflow STAGED

Cross-repo complement to the previous-iteration smoke test
(tests/smoke-architecture-sync.test.mjs). Together they
give complete drift coverage of the shared ARCHITECTURE.md
spec:
  - Smoke test: catches dev-with-sibling-clone case at
    npm test time. Skips if sister not present.
  - Workflow: catches CI-with-one-repo-checked-out case.
    Curls sister via raw.githubusercontent.com (public),
    diffs against local. Fails loudly on byte mismatch.

Adds auto-qa/harness/ci/auto-qa-harness-architecture-sync.yml.staged
pointing at the interface sister. Optional sister_branch
input (default auto-qa; switch to main post-merge).
Trigger: workflow_dispatch only.

Sister-side workflow on the interface repo mirrors this in
reverse (looks up futarchy-api).

Validation:
  - YAML re-parsed clean via js-yaml@4
  - Sister raw URL returns HTTP/2 200 (publicly accessible)
  - Simulated workflow locally: curl + diff → PASS

ci/README.md staged-table updated (2 rows now); promote
order documented (smoke first, then this).
… in staged CI

Two new daemon-free smoke files shipped earlier this session
(smoke-invariants-catalog.test.mjs,
smoke-architecture-sync.test.mjs) were not exercised by the
api-side staged CI workflow, which runs only smoke:scenarios.

Adds explicit steps for each in
auto-qa-harness-smoke.yml.staged:
  - Run invariants-catalog smoke test (unit-level drift
    assertion, sister to the workflow-level git-diff check
    shipped in slice 3e-extend)
  - Run architecture-sync smoke test (Phase 0 doc-side
    sister-link check; SKIPS cleanly in CI's single-repo
    checkout — the cross-repo workflow handles actual
    sister drift)

Why explicit steps vs broadening to npm test: the api
harness has 9 daemon-required smoke files (anvil + docker +
indexers) that would fail in CI without that infra.

Validation: YAML re-validated via js-yaml@4; both tests
pass 1/1 in isolation. Trigger remains workflow_dispatch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant