Skip to content

release: To Prod#1236

Merged
joelorzet merged 44 commits into
prodfrom
staging
May 13, 2026
Merged

release: To Prod#1236
joelorzet merged 44 commits into
prodfrom
staging

Conversation

@suisuss

@suisuss suisuss commented May 13, 2026

Copy link
Copy Markdown

No description provided.

suisuss added 30 commits May 13, 2026 09:57
Prod keeperhub-common pods are being evicted by kubelet under node
memory pressure because actual working set (~2.3 GiB) is ~4.6x the
requests.memory of 512Mi.

- requests.memory: 512Mi -> 2Gi (prod), 256Mi -> 2Gi (staging)
- limits.memory: 3Gi -> 4Gi (prod), 2Gi -> 4Gi (staging)
- staging NODE_OPTIONS: add --max-old-space-size=3072 to mirror prod

The requests bump removes the pod from the kubelet eviction-priority
list. The limits bump leaves ~1 GiB headroom above the V8 heap cap
to also protect against the cgroup-OOM mode in the original ticket.
…roup-limit

fix: KEEP-445 raise keeperhub-common memory request and limit
Vertical slice covering aave-v3 and superfluid on Sepolia end-to-end.
Adding further protocols/chains is a pure SSOT edit with no engine
changes.

Architecture (programmatic, no on-disk fixtures):
- Each protocols/<slug>.ts co-locates a TEST_DATA: ProtocolTestData
  export alongside its defineProtocol call. ProtocolDefinition gains
  an optional testData field so the registry can carry it.
- lib/test-data/build-workflow.ts exports pure functions
  buildSetupWorkflow, buildActionWorkflow, buildAllForProtocol,
  toWebhookTriggered, listCoverageTargets. Both the seeder and the
  test runner call them.
- Bindings in TEST_DATA are plain strings (resolved as token symbols
  against TOKEN_REGISTRY when typed as address) plus four helpers
  (amount, contract, native, wallet) for the cases that need explicit
  decimals / contract keys / native units / the runtime wallet.
- KEEP-529 made the persistent test wallet HSM-generated per
  environment, so the canonical address is no longer a compile-time
  constant. Builders accept `walletAddress: string` and resolve the
  `wallet()` sentinel against it; callers look the address up once
  from `organization_wallets` at the test/seed entry point.
- Seeder (scripts/seed/seed-protocol-workflows.ts): looks up the
  test org's active wallet, then walks registered protocols and
  inserts one row per (protocol, chain, action, trigger) plus the
  setup row per (protocol, chain). Uses onConflictDoUpdate so the
  seeded rows refresh on subsequent runs (catches SSOT or wallet
  changes).
- Test runner (tests/integration/protocol-coverage/_shared/*):
  beforeAll fetches the wallet via getTestWalletAddress(), runs
  ensureNativeGas + the setup workflow once; read+write coverage
  uses Manual only because webhook-fired execution ignores the
  trigger node's config.
- Unit test (tests/unit/build-workflow.test.ts): walks every
  (protocol, chain) target, builds setup + all (action x trigger)
  workflows with a fixed placeholder wallet, asserts each is
  well-formed against the registry. 467 assertions, no RPC, no DB.

Supersedes the prior on-disk fixture tree (the JSON files + the
generator script were deleted in an earlier iteration). The
"fixtures" are derived at consumer call time from the co-located
testData; no generator step, no parallel JSON tree to drift from
the SSOT.

Out of scope (deferred):
- CI matrix workflow YAML for the on-chain runner.
- Mainnet-only protocols' _executable:false emission.
- Migration of seed-ethena-workflows.ts and seed-erc4626-workflows.ts.
- Plugin coverage tree.
ETH was never in TOKEN_REGISTRY -- it's the native token, not an ERC20
with a contract address. Leaving it in the union meant a SSOT writer
could put { asset: "ETH" } and the resolver would pass the literal
"ETH" through to the workflow's config, producing an invalid address
that the executor only fails on at on-chain call time.

Drop it from the union so any attempt to use ETH as a token symbol
gets caught at typecheck. Native gas balance + amounts go through
ensureNativeGas and native(human) respectively.
Previously, defaultForSolidityType returned the test wallet address for
any address-typed input that had no SSOT binding and no protocol-level
default. This was a quiet footgun: an action with a 'recipient' field
that nobody bound would silently route to the signer's own address,
producing semantically wrong (signer->self) workflows that look fine
in the UI and only fail in obscure ways on-chain.

Throw with a clear message identifying the offender:

  address-typed input "recipient" on "foo/bar" has no binding and no
  protocol-level default. Add it to TEST_DATA ...

Adds SSOT bindings for the 6 previously-unbound superfluid write
actions surfaced by this stricter check:

- create-pool          (token, admin)
- update-member-units  (pool, member, units, userData)
- distribute           (token, from, pool, amount, userData)
- distribute-flow      (token, from, pool, flowRate, userData)
- connect-pool         (pool, userData)
- grant-flow-operator  (token, flowOperator, permissions, flowRateAllowance)

The 'pool' bindings use the zero address as a placeholder -- Phase 1
doesn't provision a GDA pool, so on-chain execution of these actions
reverts until a future iteration adds pool-creation to the setup
workflow. The seeded rows still load cleanly in the UI and exercise
the workflow-construction path.

Verified: 'pnpm vitest run tests/unit/build-workflow.test.ts' still
467/467; 'pnpm db:seed-workflows' produces a create-pool row whose
config now has admin=test-wallet, token=FUSDCX (was: admin=test-wallet
silently via the address default, but only because that happened to
match -- the next protocol with a non-self admin would have produced
wrong workflows).
buildSetupWorkflow / buildActionWorkflow / buildAllForProtocol were
4-5 positional args each, with two hex-like strings (chainId and
walletAddress) sitting next to each other -- easy to swap at a
callsite, no type-level guard against it.

Switch each to an options-object signature with a named type
(BuildSetupOptions, BuildActionOptions, BuildCoverageOptions).
Callsites updated:

- tests/integration/protocol-coverage/_shared/setup.ts
- tests/integration/protocol-coverage/_shared/run-fixture.ts
- scripts/seed/seed-protocol-workflows.ts (both functions)
- tests/unit/build-workflow.test.ts (both functions)

No behavioural change. Typecheck and the 467-assertion unit test
still pass.
The bulk per-action tests roundtrip builder metadata (built._phase ===
action.type etc.) but never assert that the walletAddress argument
actually propagates through resolution. A regression where the builder
silently fell back to a hardcoded constant would slip past every
existing assertion.

Adds two focused describes:

1. 'wallet address propagation' -- builds aave-v3/supply with a known
   TEST_WALLET value, asserts cfg.onBehalfOf === TEST_WALLET. One
   pinpoint check.

2. 'toWebhookTriggered' -- direct coverage for the helper that rewrites
   non-Webhook trigger nodes to satisfy the production webhook endpoint
   (HTTP 400 gate at app/api/workflows/[id]/webhook/route.ts:244).
   Asserts: config rewritten, _trigger metadata updated, action nodes
   untouched, source workflow not mutated.

472 assertions total (was 467).
The bulk per-variant loop was asserting built._phase === action.type,
built._protocol === protocolSlug, built._chainId === chainId, and
built._trigger === trigger for every (action, trigger) variant. These
catch a real class of bug (builder hardcodes wrong field) but they're
100%-correlated -- a broken builder would fail 100+ identical
assertions in a row, none of which add signal beyond the first.

Pull the four metadata-roundtrip assertions out of the loop into a
single dedicated 'builders honour their inputs' describe block:

- One test for buildSetupWorkflow (_phase=setup, _trigger=Manual, etc.)
- One it.each across TRIGGER_TYPES for buildActionWorkflow (catches
  per-trigger handling bugs without iterating every action).

Also drop the 'has Manual trigger + at least one action node' and
'metadata' duplicates from the setup workflow describe -- both
subsumed by the new dedicated test or the existing approve-token
structural check.

Per-variant tests retain the load-bearing assertions:
- node count + presence (catches builder-forgot-to-emit bugs)
- registry agreement (catches actionType/contract/function drift)
- required inputs present (catches missing SSOT bindings)
- contract deployment exists for chain (catches mainnet-only SSOTs)

474 tests pass (was 472; net +2 from the dedicated trigger-roundtrip
it.each).
KEEP-529's commit flagged the paraWallets alias as a follow-up
deletion. New code in this PR shouldn't entrench the old name; rename
the two imports + their drizzle query references to the canonical
organizationWallets. The rest of the codebase (seed-test-wallet,
fund-test-wallet, web3-steps tests, etc.) keeps the alias for now --
those will get cleaned up as part of the broader Para-decommission
follow-up, not in this PR.

Files touched: 2. Zero behavioural change -- paraWallets was just a
re-export of organizationWallets. Typecheck clean, 474 unit
assertions still pass.
Symmetric with --protocol / --phase / --trigger. Lets you scope a seed
run to one chain when the SSOT grows beyond Sepolia (no current use --
Phase 1 SSOT only covers Sepolia -- but the flag costs nothing and
matches the shape callers expect).

Smoke:
- 'db:seed-workflows --protocol=aave-v3 --chain=11155111' seeds 36 rows
  (1 setup + 7 actions x 5 triggers). Behaviour matches the unfiltered
  aave-v3 case today.
- 'db:seed-workflows --chain=1' reports 'No coverage targets matched
  (protocol=*, chain=1)' and exits cleanly -- mainnet has no SSOT
  entries, so the filter correctly empties the iteration.
Previously the seeder used 'onConflictDoUpdate' to refresh every existing
row's nodes/edges on every run. A user editing a seeded workflow in the
UI (rename, tweak input, add a node) had their changes silently
overwritten on the next seed pass. No feedback in the script log, no
flag in the workflow's UI.

Adds a dedicated 'seededAt' timestamp column on workflows. The seeder
writes it on insert and on refresh; never on user edits. Detection
becomes 'updatedAt > seededAt + epsilon' and is unambiguous:

- seededAt IS NULL          -> user-created in the UI; skip (we never
                              seeded this row)
- updatedAt - seededAt < eps -> seeder last wrote this row; refresh
- updatedAt - seededAt > eps -> user edited after the last seed touch;
                              skip with a log line

Doesn't overload createdAt or updatedAt -- the public-facing readers
('app/api/workflows/route.ts' sorts by createdAt; 'app/api/workflows/
current/route.ts' and 'app/api/workflows/public/route.ts' sort by
updatedAt; MCP endpoints expose both) keep their canonical meanings.

Schema change is additive: nullable column, no default, zero impact on
existing rows. Migration is hand-written ('drizzle/0071_...') because
drizzle-kit 0.31.10 has a known BigInt-serialization bug in generate
(repo precedent: snapshots stop at 0058 while SQL goes to 0070).

Verified end-to-end against the KEEP-458 DB:
- Clean re-seed of 117 existing rows: '0 inserted, 117 refreshed, 0
  skipped, 0 failed' (after a one-time seededAt = updatedAt backfill
  on the test DB; production rows have no seededAt yet so the first
  prod seed run will refresh all rows once before steady state).
- Simulated UI edit (updatedAt += 1 minute) on one supply workflow:
  '0 inserted, 35 refreshed, 1 skipped (user-edited), 0 failed' --
  exactly the edited row is preserved.
- Tally counts now distinguish inserted / refreshed / skipped /
  failed instead of a single boolean.
Wire automatic ERC20 acquisition for the protocol-coverage suite and
align the SSOT with what Aave V3 / Superfluid Sepolia actually accept.
Local run against real Sepolia: 19 passed | 4 skipped | 0 failed.

chain-test-data:
- Add LINK token entry; switch aave-v3 reserve from DAI to LINK (DAI/
  USDC/USDT on Aave V3 Sepolia all hit SUPPLY_CAP_EXCEEDED (51), LINK
  has headroom and is borrowable -- verified via eth_call).
- Populate FAUCETS for Sepolia: Aave permissionless mint(token,to,amt)
  for LINK and FUSDC's own mint(to,amt). Verified both contracts on-
  chain (eth_getCode + selector inspection).

funding: replace ensureErc20Balance with ensureErc20Acquired. When
balance is short and a FAUCETS entry exists, the funder EOA calls the
faucet; args bound by name (token / to|recipient / amount|value).
Throws with a "manual provisioning required" message when no faucet is
registered. Falls back to ensureNativeGas's existing semantics.

setup runner: pull the protocol's setup spec from the registry and
ensureErc20Acquired each requiredToken before firing the setup workflow.

types + run-fixture: add ProtocolChainTestData.skipped: Record<slug,
reason>. The seeder still surfaces these workflows in the dashboard,
but the integration suite marks them test.skip(reason) -- documented
gaps (e.g. GDA pool actions need the create-pool tx receipt to extract
the deployed address, deferred).

aave-v3: requiredTokens=200 LINK, approval=200, initial supply=100 in
protocolSteps so write tests have a position to operate on. All seven
write+read actions switch from DAI to LINK.

superfluid:
- Reads on superToken (userSpecifiedAddress contract): use
  contractAddress instead of token (the action input is `account`, not
  `token`).
- CFA flow tests (create/update/delete-flow): receiver is
  0x...dEaD instead of self; Superfluid reverts self-flows with
  CFA_NO_SELF_FLOW (0xa47338ef). Sender stays the wallet so the same
  flow row is created, updated, and deleted in sequence.
- grant-flow-operator: same -- non-self flowOperator.
- skipped map for four GDA-pool actions (update-member-units,
  distribute, distribute-flow, connect-pool) -- on-chain execution
  needs the pool address from create-pool's receipt.
The block claimed FAUCETS was unpopulated and ERC20 acquisition was
verify-only. Both were true at one point and stopped being true when
the faucet wiring landed -- FAUCETS now carries the Aave Sepolia +
FUSDC entries and ensureErc20Acquired calls them. The remaining
behaviour is documented on FAUCETS itself; the type only needs to
explain its shape.
When a previous run leaves a partial balance, minting `needed` re-
mints the full target -- wallet ends with balance + needed instead
of just needed. Wastes faucet quota (Aave Sepolia caps each call at
MAX_MINT_AMOUNT) and could clip the cap with a few protocols sharing
the same token.

The early-return on balance >= needed guarantees gap > 0, so no
zero-amount mint call.
getChainRpcUrl opened a fresh Postgres connection on every call. A
single setup run calls it 3+ times (ensureNativeGas plus one per
requiredTokens), so the cache eliminates 2+ round-trips per protocol
without changing observable behaviour. chains.defaultPrimaryRpc is
bootstrap-time data, no invalidation needed.
…Address`

The builder treats `bindings.contractAddress` as a virtual hint for
userSpecifiedAddress contracts (Superfluid SuperTokens). If a protocol
ever declares a real action input named `contractAddress`, the same
binding would be used for both purposes -- the virtual hint and the
input -- silent misbehaviour that produces a workflow with the wrong
config shape.

Lazy check inside buildProtocolActionNode: throws when the action
being built carries the reserved name. The unit test in
tests/unit/build-workflow.test.ts iterates every (protocol, action,
trigger) so a violation would surface in CI without crashing the
dev server's protocol-registry import.
Export bindFaucetArgs (and its AbiInput/AbiFunction types) from
funding.ts and lock in the case-insensitive name conventions with
a unit test. The previous integration coverage exercised exactly two
shapes (Aave's mint(token,to,amount), FUSDC's mint(to,amount));
adding a new FAUCETS entry with a slightly different shape -- say
`recipient` instead of `to` -- would have surfaced mid-CI. Now it
fails at `pnpm test:unit` instead.

Cases:
- Aave + FUSDC shapes in declaration order
- Case-insensitive name matching
- recipient/to and value/amount aliases
- Unknown input name throws with the offending name in the message
- Declared input order preserved even when names are swapped (a
  regression where the binder sorts inputs would silently mis-order
  the args)
Extract the gating logic in upsertOne into a pure decideSeedAction
function and lock it down with eight cases. The logic distinguishes
between a row the seeder owns (refresh on next pass) and one a human
edited via the UI (leave alone). Getting that wrong silently
clobbers user changes -- exactly the bug the seeded_at column was
added to prevent.

Coverage:
- undefined existing -> insert
- seededAt null -> skip(user-created)
- gapMs == 0 (just-inserted) -> refresh
- gapMs < epsilon -> refresh
- gapMs == epsilon (boundary, strict >) -> refresh
- gapMs > epsilon -> skip(user-edited) carries gapMs
- negative gap (clock skew) -> refresh
- custom epsilon honoured
Extract planPhaseFixtures from runPhaseFixtures: returns a pure
FixtureCase[] describing what each action should be (run / skip /
no-protocol / no-actions). runPhaseFixtures still drives the vitest
test() / test.skip() calls, but the decision layer is now testable
without spying on vitest internals or running the integration suite.

Cases:
- undefined protocol -> no-protocol entry
- no actions match phase -> no-actions entry
- phase filter excludes the wrong type
- action not in skipped map -> run
- action in skipped map -> skip with reason preserved
- mixed run/skip in one plan
- omitted skipped on chain entry -> all run
- skipped scoped by chain id (mainnet skip doesn't bleed into Sepolia)
…ave-v3 TEST_DATA

The protocol-coverage test runner iterates protocol.actions in array
order, and tests share the wallet's on-chain Aave position as a
singleton. repay needs an open debt position, so it must follow borrow.
Future reviewers reordering the array (e.g. alphabetising) would
silently break the suite; the comment names the invariant where it
lives.
…uperfluid TEST_DATA

Mirrors the aave-v3 comment block: names the specific on-chain
dependencies (update-flow/delete-flow follow create-flow), flags the
independent actions (wrap/unwrap, grant-flow-operator), and points at
the skipped GDA pool actions.
…ta is absent

buildActionWorkflow accepts arbitrary chainIds, so the executable bit
must reflect whether any testData was actually vetted for the chain.
The prior `chainData?.enabled !== false` returned true on missing
chainData (`undefined !== false`), so a caller building a workflow
for an un-covered chain saw _executable=true alongside an action node
whose required address inputs would throw at resolveBinding time.
…r-liveness

fix: KEEP-555 detect dead newHeads subscriptions via height-advance liveness
@joelorzet joelorzet merged commit 8378e6a into prod May 13, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants