release: To Prod by suisuss · Pull Request #1236 · KeeperHub/keeperhub

suisuss · 2026-05-13T07:53:08Z

No description provided.

Prod keeperhub-common pods are being evicted by kubelet under node memory pressure because actual working set (~2.3 GiB) is ~4.6x the requests.memory of 512Mi. - requests.memory: 512Mi -> 2Gi (prod), 256Mi -> 2Gi (staging) - limits.memory: 3Gi -> 4Gi (prod), 2Gi -> 4Gi (staging) - staging NODE_OPTIONS: add --max-old-space-size=3072 to mirror prod The requests bump removes the pod from the kubelet eviction-priority list. The limits bump leaves ~1 GiB headroom above the V8 heap cap to also protect against the cgroup-OOM mode in the original ticket.

…roup-limit fix: KEEP-445 raise keeperhub-common memory request and limit

Vertical slice covering aave-v3 and superfluid on Sepolia end-to-end. Adding further protocols/chains is a pure SSOT edit with no engine changes. Architecture (programmatic, no on-disk fixtures): - Each protocols/<slug>.ts co-locates a TEST_DATA: ProtocolTestData export alongside its defineProtocol call. ProtocolDefinition gains an optional testData field so the registry can carry it. - lib/test-data/build-workflow.ts exports pure functions buildSetupWorkflow, buildActionWorkflow, buildAllForProtocol, toWebhookTriggered, listCoverageTargets. Both the seeder and the test runner call them. - Bindings in TEST_DATA are plain strings (resolved as token symbols against TOKEN_REGISTRY when typed as address) plus four helpers (amount, contract, native, wallet) for the cases that need explicit decimals / contract keys / native units / the runtime wallet. - KEEP-529 made the persistent test wallet HSM-generated per environment, so the canonical address is no longer a compile-time constant. Builders accept `walletAddress: string` and resolve the `wallet()` sentinel against it; callers look the address up once from `organization_wallets` at the test/seed entry point. - Seeder (scripts/seed/seed-protocol-workflows.ts): looks up the test org's active wallet, then walks registered protocols and inserts one row per (protocol, chain, action, trigger) plus the setup row per (protocol, chain). Uses onConflictDoUpdate so the seeded rows refresh on subsequent runs (catches SSOT or wallet changes). - Test runner (tests/integration/protocol-coverage/_shared/*): beforeAll fetches the wallet via getTestWalletAddress(), runs ensureNativeGas + the setup workflow once; read+write coverage uses Manual only because webhook-fired execution ignores the trigger node's config. - Unit test (tests/unit/build-workflow.test.ts): walks every (protocol, chain) target, builds setup + all (action x trigger) workflows with a fixed placeholder wallet, asserts each is well-formed against the registry. 467 assertions, no RPC, no DB. Supersedes the prior on-disk fixture tree (the JSON files + the generator script were deleted in an earlier iteration). The "fixtures" are derived at consumer call time from the co-located testData; no generator step, no parallel JSON tree to drift from the SSOT. Out of scope (deferred): - CI matrix workflow YAML for the on-chain runner. - Mainnet-only protocols' _executable:false emission. - Migration of seed-ethena-workflows.ts and seed-erc4626-workflows.ts. - Plugin coverage tree.

ETH was never in TOKEN_REGISTRY -- it's the native token, not an ERC20 with a contract address. Leaving it in the union meant a SSOT writer could put { asset: "ETH" } and the resolver would pass the literal "ETH" through to the workflow's config, producing an invalid address that the executor only fails on at on-chain call time. Drop it from the union so any attempt to use ETH as a token symbol gets caught at typecheck. Native gas balance + amounts go through ensureNativeGas and native(human) respectively.

Previously, defaultForSolidityType returned the test wallet address for any address-typed input that had no SSOT binding and no protocol-level default. This was a quiet footgun: an action with a 'recipient' field that nobody bound would silently route to the signer's own address, producing semantically wrong (signer->self) workflows that look fine in the UI and only fail in obscure ways on-chain. Throw with a clear message identifying the offender: address-typed input "recipient" on "foo/bar" has no binding and no protocol-level default. Add it to TEST_DATA ... Adds SSOT bindings for the 6 previously-unbound superfluid write actions surfaced by this stricter check: - create-pool (token, admin) - update-member-units (pool, member, units, userData) - distribute (token, from, pool, amount, userData) - distribute-flow (token, from, pool, flowRate, userData) - connect-pool (pool, userData) - grant-flow-operator (token, flowOperator, permissions, flowRateAllowance) The 'pool' bindings use the zero address as a placeholder -- Phase 1 doesn't provision a GDA pool, so on-chain execution of these actions reverts until a future iteration adds pool-creation to the setup workflow. The seeded rows still load cleanly in the UI and exercise the workflow-construction path. Verified: 'pnpm vitest run tests/unit/build-workflow.test.ts' still 467/467; 'pnpm db:seed-workflows' produces a create-pool row whose config now has admin=test-wallet, token=FUSDCX (was: admin=test-wallet silently via the address default, but only because that happened to match -- the next protocol with a non-self admin would have produced wrong workflows).

buildSetupWorkflow / buildActionWorkflow / buildAllForProtocol were 4-5 positional args each, with two hex-like strings (chainId and walletAddress) sitting next to each other -- easy to swap at a callsite, no type-level guard against it. Switch each to an options-object signature with a named type (BuildSetupOptions, BuildActionOptions, BuildCoverageOptions). Callsites updated: - tests/integration/protocol-coverage/_shared/setup.ts - tests/integration/protocol-coverage/_shared/run-fixture.ts - scripts/seed/seed-protocol-workflows.ts (both functions) - tests/unit/build-workflow.test.ts (both functions) No behavioural change. Typecheck and the 467-assertion unit test still pass.

The bulk per-action tests roundtrip builder metadata (built._phase === action.type etc.) but never assert that the walletAddress argument actually propagates through resolution. A regression where the builder silently fell back to a hardcoded constant would slip past every existing assertion. Adds two focused describes: 1. 'wallet address propagation' -- builds aave-v3/supply with a known TEST_WALLET value, asserts cfg.onBehalfOf === TEST_WALLET. One pinpoint check. 2. 'toWebhookTriggered' -- direct coverage for the helper that rewrites non-Webhook trigger nodes to satisfy the production webhook endpoint (HTTP 400 gate at app/api/workflows/[id]/webhook/route.ts:244). Asserts: config rewritten, _trigger metadata updated, action nodes untouched, source workflow not mutated. 472 assertions total (was 467).

The bulk per-variant loop was asserting built._phase === action.type, built._protocol === protocolSlug, built._chainId === chainId, and built._trigger === trigger for every (action, trigger) variant. These catch a real class of bug (builder hardcodes wrong field) but they're 100%-correlated -- a broken builder would fail 100+ identical assertions in a row, none of which add signal beyond the first. Pull the four metadata-roundtrip assertions out of the loop into a single dedicated 'builders honour their inputs' describe block: - One test for buildSetupWorkflow (_phase=setup, _trigger=Manual, etc.) - One it.each across TRIGGER_TYPES for buildActionWorkflow (catches per-trigger handling bugs without iterating every action). Also drop the 'has Manual trigger + at least one action node' and 'metadata' duplicates from the setup workflow describe -- both subsumed by the new dedicated test or the existing approve-token structural check. Per-variant tests retain the load-bearing assertions: - node count + presence (catches builder-forgot-to-emit bugs) - registry agreement (catches actionType/contract/function drift) - required inputs present (catches missing SSOT bindings) - contract deployment exists for chain (catches mainnet-only SSOTs) 474 tests pass (was 472; net +2 from the dedicated trigger-roundtrip it.each).

KEEP-529's commit flagged the paraWallets alias as a follow-up deletion. New code in this PR shouldn't entrench the old name; rename the two imports + their drizzle query references to the canonical organizationWallets. The rest of the codebase (seed-test-wallet, fund-test-wallet, web3-steps tests, etc.) keeps the alias for now -- those will get cleaned up as part of the broader Para-decommission follow-up, not in this PR. Files touched: 2. Zero behavioural change -- paraWallets was just a re-export of organizationWallets. Typecheck clean, 474 unit assertions still pass.

Symmetric with --protocol / --phase / --trigger. Lets you scope a seed run to one chain when the SSOT grows beyond Sepolia (no current use -- Phase 1 SSOT only covers Sepolia -- but the flag costs nothing and matches the shape callers expect). Smoke: - 'db:seed-workflows --protocol=aave-v3 --chain=11155111' seeds 36 rows (1 setup + 7 actions x 5 triggers). Behaviour matches the unfiltered aave-v3 case today. - 'db:seed-workflows --chain=1' reports 'No coverage targets matched (protocol=*, chain=1)' and exits cleanly -- mainnet has no SSOT entries, so the filter correctly empties the iteration.

Previously the seeder used 'onConflictDoUpdate' to refresh every existing row's nodes/edges on every run. A user editing a seeded workflow in the UI (rename, tweak input, add a node) had their changes silently overwritten on the next seed pass. No feedback in the script log, no flag in the workflow's UI. Adds a dedicated 'seededAt' timestamp column on workflows. The seeder writes it on insert and on refresh; never on user edits. Detection becomes 'updatedAt > seededAt + epsilon' and is unambiguous: - seededAt IS NULL -> user-created in the UI; skip (we never seeded this row) - updatedAt - seededAt < eps -> seeder last wrote this row; refresh - updatedAt - seededAt > eps -> user edited after the last seed touch; skip with a log line Doesn't overload createdAt or updatedAt -- the public-facing readers ('app/api/workflows/route.ts' sorts by createdAt; 'app/api/workflows/ current/route.ts' and 'app/api/workflows/public/route.ts' sort by updatedAt; MCP endpoints expose both) keep their canonical meanings. Schema change is additive: nullable column, no default, zero impact on existing rows. Migration is hand-written ('drizzle/0071_...') because drizzle-kit 0.31.10 has a known BigInt-serialization bug in generate (repo precedent: snapshots stop at 0058 while SQL goes to 0070). Verified end-to-end against the KEEP-458 DB: - Clean re-seed of 117 existing rows: '0 inserted, 117 refreshed, 0 skipped, 0 failed' (after a one-time seededAt = updatedAt backfill on the test DB; production rows have no seededAt yet so the first prod seed run will refresh all rows once before steady state). - Simulated UI edit (updatedAt += 1 minute) on one supply workflow: '0 inserted, 35 refreshed, 1 skipped (user-edited), 0 failed' -- exactly the edited row is preserved. - Tally counts now distinguish inserted / refreshed / skipped / failed instead of a single boolean.

Wire automatic ERC20 acquisition for the protocol-coverage suite and align the SSOT with what Aave V3 / Superfluid Sepolia actually accept. Local run against real Sepolia: 19 passed | 4 skipped | 0 failed. chain-test-data: - Add LINK token entry; switch aave-v3 reserve from DAI to LINK (DAI/ USDC/USDT on Aave V3 Sepolia all hit SUPPLY_CAP_EXCEEDED (51), LINK has headroom and is borrowable -- verified via eth_call). - Populate FAUCETS for Sepolia: Aave permissionless mint(token,to,amt) for LINK and FUSDC's own mint(to,amt). Verified both contracts on- chain (eth_getCode + selector inspection). funding: replace ensureErc20Balance with ensureErc20Acquired. When balance is short and a FAUCETS entry exists, the funder EOA calls the faucet; args bound by name (token / to|recipient / amount|value). Throws with a "manual provisioning required" message when no faucet is registered. Falls back to ensureNativeGas's existing semantics. setup runner: pull the protocol's setup spec from the registry and ensureErc20Acquired each requiredToken before firing the setup workflow. types + run-fixture: add ProtocolChainTestData.skipped: Record<slug, reason>. The seeder still surfaces these workflows in the dashboard, but the integration suite marks them test.skip(reason) -- documented gaps (e.g. GDA pool actions need the create-pool tx receipt to extract the deployed address, deferred). aave-v3: requiredTokens=200 LINK, approval=200, initial supply=100 in protocolSteps so write tests have a position to operate on. All seven write+read actions switch from DAI to LINK. superfluid: - Reads on superToken (userSpecifiedAddress contract): use contractAddress instead of token (the action input is `account`, not `token`). - CFA flow tests (create/update/delete-flow): receiver is 0x...dEaD instead of self; Superfluid reverts self-flows with CFA_NO_SELF_FLOW (0xa47338ef). Sender stays the wallet so the same flow row is created, updated, and deleted in sequence. - grant-flow-operator: same -- non-self flowOperator. - skipped map for four GDA-pool actions (update-member-units, distribute, distribute-flow, connect-pool) -- on-chain execution needs the pool address from create-pool's receipt.

The block claimed FAUCETS was unpopulated and ERC20 acquisition was verify-only. Both were true at one point and stopped being true when the faucet wiring landed -- FAUCETS now carries the Aave Sepolia + FUSDC entries and ensureErc20Acquired calls them. The remaining behaviour is documented on FAUCETS itself; the type only needs to explain its shape.

When a previous run leaves a partial balance, minting `needed` re- mints the full target -- wallet ends with balance + needed instead of just needed. Wastes faucet quota (Aave Sepolia caps each call at MAX_MINT_AMOUNT) and could clip the cap with a few protocols sharing the same token. The early-return on balance >= needed guarantees gap > 0, so no zero-amount mint call.

getChainRpcUrl opened a fresh Postgres connection on every call. A single setup run calls it 3+ times (ensureNativeGas plus one per requiredTokens), so the cache eliminates 2+ round-trips per protocol without changing observable behaviour. chains.defaultPrimaryRpc is bootstrap-time data, no invalidation needed.

…Address` The builder treats `bindings.contractAddress` as a virtual hint for userSpecifiedAddress contracts (Superfluid SuperTokens). If a protocol ever declares a real action input named `contractAddress`, the same binding would be used for both purposes -- the virtual hint and the input -- silent misbehaviour that produces a workflow with the wrong config shape. Lazy check inside buildProtocolActionNode: throws when the action being built carries the reserved name. The unit test in tests/unit/build-workflow.test.ts iterates every (protocol, action, trigger) so a violation would surface in CI without crashing the dev server's protocol-registry import.

Export bindFaucetArgs (and its AbiInput/AbiFunction types) from funding.ts and lock in the case-insensitive name conventions with a unit test. The previous integration coverage exercised exactly two shapes (Aave's mint(token,to,amount), FUSDC's mint(to,amount)); adding a new FAUCETS entry with a slightly different shape -- say `recipient` instead of `to` -- would have surfaced mid-CI. Now it fails at `pnpm test:unit` instead. Cases: - Aave + FUSDC shapes in declaration order - Case-insensitive name matching - recipient/to and value/amount aliases - Unknown input name throws with the offending name in the message - Declared input order preserved even when names are swapped (a regression where the binder sorts inputs would silently mis-order the args)

Extract the gating logic in upsertOne into a pure decideSeedAction function and lock it down with eight cases. The logic distinguishes between a row the seeder owns (refresh on next pass) and one a human edited via the UI (leave alone). Getting that wrong silently clobbers user changes -- exactly the bug the seeded_at column was added to prevent. Coverage: - undefined existing -> insert - seededAt null -> skip(user-created) - gapMs == 0 (just-inserted) -> refresh - gapMs < epsilon -> refresh - gapMs == epsilon (boundary, strict >) -> refresh - gapMs > epsilon -> skip(user-edited) carries gapMs - negative gap (clock skew) -> refresh - custom epsilon honoured

Extract planPhaseFixtures from runPhaseFixtures: returns a pure FixtureCase[] describing what each action should be (run / skip / no-protocol / no-actions). runPhaseFixtures still drives the vitest test() / test.skip() calls, but the decision layer is now testable without spying on vitest internals or running the integration suite. Cases: - undefined protocol -> no-protocol entry - no actions match phase -> no-actions entry - phase filter excludes the wrong type - action not in skipped map -> run - action in skipped map -> skip with reason preserved - mixed run/skip in one plan - omitted skipped on chain entry -> all run - skipped scoped by chain id (mainnet skip doesn't bleed into Sepolia)

…stWorkflow

… session-total)

…fallback

…ress only

…orkflowEdgeJson

…s pure helpers

…t type-checks under ES2017

…ave-v3 TEST_DATA The protocol-coverage test runner iterates protocol.actions in array order, and tests share the wallet's on-chain Aave position as a singleton. repay needs an open debt position, so it must follow borrow. Future reviewers reordering the array (e.g. alphabetising) would silently break the suite; the comment names the invariant where it lives.

…uperfluid TEST_DATA Mirrors the aave-v3 comment block: names the specific on-chain dependencies (update-flow/delete-flow follow create-flow), flags the independent actions (wrap/unwrap, grant-flow-operator), and points at the skipped GDA pool actions.

…ta is absent buildActionWorkflow accepts arbitrary chainIds, so the executable bit must reflect whether any testData was actually vetted for the chain. The prior `chainData?.enabled !== false` returned true on missing chainData (`undefined !== false`), so a caller building a workflow for an un-covered chain saw _executable=true alongside an action node whose required address inputs would throw at resolveBinding time.

…r-liveness fix: KEEP-555 detect dead newHeads subscriptions via height-advance liveness

…ype plumbing (#1238)

…auge with counter (#1239)

suisuss added 30 commits May 13, 2026 09:57

Merge pull request #1229 from KeeperHub/fix/KEEP-445-prod-oom-heap-cg…

19b8f44

…roup-limit fix: KEEP-445 raise keeperhub-common memory request and limit

docs(testing): KEEP-458 reword stale fixture-tree comment on createTe…

4483ffd

…stWorkflow

fix(testing): KEEP-458 seeder exits non-zero when any row upsert fails

0acbbf1

docs(testing): KEEP-458 clarify RPC URL cache wording (per-chain, not…

2eaec33

… session-total)

fix(testing): KEEP-458 use explicit token priority for event-trigger …

c91a711

…fallback

fix(testing): KEEP-458 tighten address-binding resolver to scalar add…

2fe13f3

…ress only

refactor(testing): KEEP-458 drop noExplicitAny via WorkflowNodeJson/W…

cdad1cd

…orkflowEdgeJson

fix(testing): KEEP-458 guard seeder main entry so tests can import it…

4b52495

…s pure helpers

fix(testing): KEEP-458 compose BigInt via BigInt() so faucet-args tes…

d4118ce

…t type-checks under ES2017

suisuss temporarily deployed to staging May 13, 2026 08:19 — with GitHub Actions Inactive

Merge pull request #1237 from KeeperHub/feat/KEEP-555-block-dispatche…

48c459b

…r-liveness fix: KEEP-555 detect dead newHeads subscriptions via height-advance liveness

joelorzet temporarily deployed to staging May 13, 2026 13:57 — with GitHub Actions Inactive

joelorzet temporarily deployed to staging May 13, 2026 14:09 — with GitHub Actions Inactive

joelorzet temporarily deployed to staging May 13, 2026 14:14 — with GitHub Actions Inactive

feat: add per-(trigger_type, chain) execution counter and X-Trigger-T…

8c4d116

…ype plumbing (#1238)

OleksandrUA had a problem deploying to staging May 13, 2026 14:21 — with GitHub Actions Failure

OleksandrUA temporarily deployed to staging May 13, 2026 14:21 — with GitHub Actions Inactive

OleksandrUA temporarily deployed to staging May 13, 2026 14:37 — with GitHub Actions Inactive

OleksandrUA temporarily deployed to staging May 13, 2026 14:38 — with GitHub Actions Inactive

OleksandrUA temporarily deployed to staging May 13, 2026 14:43 — with GitHub Actions Inactive

OleksandrUA temporarily deployed to staging May 13, 2026 14:54 — with GitHub Actions Inactive

feat(metrics): KEEP-545 classify workflow execution errors, replace g…

85af21a

…auge with counter (#1239)

OleksandrUA temporarily deployed to staging May 13, 2026 16:23 — with GitHub Actions Inactive

OleksandrUA temporarily deployed to staging May 13, 2026 16:35 — with GitHub Actions Inactive

OleksandrUA temporarily deployed to staging May 13, 2026 16:39 — with GitHub Actions Inactive

OleksandrUA temporarily deployed to staging May 13, 2026 16:48 — with GitHub Actions Inactive

joelorzet merged commit 8378e6a into prod May 13, 2026
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: To Prod#1236

release: To Prod#1236
joelorzet merged 44 commits into
prodfrom
staging

suisuss commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

suisuss commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants