Skip to content

release: to prod#1283

Merged
OleksandrUA merged 8 commits into
prodfrom
staging
May 18, 2026
Merged

release: to prod#1283
OleksandrUA merged 8 commits into
prodfrom
staging

Conversation

@OleksandrUA

Copy link
Copy Markdown

Delivering changes besides the DB modifications (indexes) delivered with PR labels and manual prep-work as a part of KEEP-432

eskp and others added 8 commits May 15, 2026 14:49
Adds two optional fields to the HTTP Request action so aggregator
workflows can keep running when one source is slow or down:

- timeout (seconds, default 5, clamped 1-30). Threads an AbortSignal
  through safeFetch; coerces strings the visual editor stores and
  numbers the MCP caller sends, falling back to the default for
  non-numeric input.
- failOnError (boolean, default true). When false, a non-2xx response
  or a thrown error (incl. timeout abort) returns
  { success: true, data: null, status, error } instead of failing the
  step. Downstream nodes already read the envelope as their input,
  so they see { status, data: null, error } as the issue specifies.
  Endpoint validation always hard-fails regardless of failOnError --
  it's a config bug, not a transient runtime failure.

Surfaces:
- app/api/mcp/schemas/route.ts: new optionalFields + corrected
  outputFields (the old `headers` was aspirational; replaced with the
  real `error` output).
- lib/workflow/nodes/http-request/step.ts: implementation, with
  resolveTimeoutMs and resolveFailOnError exported for tests.
- components/workflow/config/action-config.tsx: Timeout number input
  and a "Fail workflow on error" Switch in the HTTP Request node
  config UI.

Tests (tests/unit/http-request-timeout-soft-fail.test.ts):
- resolveTimeoutMs: default, numeric, string, below-min clamp,
  above-max clamp, non-numeric fallback.
- resolveFailOnError: undefined/true/false/string-"false"/string-"true".
- httpRequest: signal wiring, 2xx success, default hard-fail on
  non-2xx, soft-fail on non-2xx with status preserved, default
  hard-fail on thrown error, soft-fail on thrown error with null
  status, endpoint validation always-hard-fail.

The codegen path (lib/workflow/codegen/templates/http-request.ts +
generateHTTPActionCode) is intentionally not updated -- it is already
mismatched with the step's input shape (url/method/body vs
endpoint/httpMethod/httpBody) on staging today, so fixing it is its
own pre-existing concern.

The marketplace x402 "execution.status mismatch" half of the original
issue is misdiagnosed there -- buildCallCompletionResponse only reads
execution.status; the real bug is a reconciliation race in
logWorkflowCompleteDb's spurious-error handler. Filing as a separate
ticket. failOnError already dissolves the repro: a soft-failed source
is not a failed node, so execution.status is success from the start.
- UI Switch (`Fail workflow on error`) now mirrors `resolveFailOnError`
  semantics so an imported workflow that persisted the literal string
  "false" doesn't display as ON while running as OFF.
- `getActionFields` for HTTP Request advertises the new `error` field
  to the visual editor's `@`-mention completion, alongside the existing
  `data` and `status`, with notes that both can be null on soft fail.
- `resolveTimeoutMs(null)` and `resolveTimeoutMs("")` now return the
  5s default. Previously `Number(null)` and `Number("")` both coerced
  to 0 and clamped up to 1s -- the most punishing possible timeout
  for a config the user did not actually set.
- Tests cover the new fallback, plus assert (via `vi.spyOn`) that
  `AbortSignal.timeout` is called with the millisecond value derived
  from `input.timeout` (and the 5s default when none is set).

Follow-ups deferred (filed/noted in PR body):
- Codegen path (`templates/http-request.ts` + `generateHTTPActionCode`)
  drops the new fields silently. This is consistent with the
  pre-existing breakage there (existing fields are also dropped) but
  worth its own ticket.
- Soft-fail observability: a `failOnError=false` step that absorbs an
  error logs as a clean success with no error tag in step metrics.
  Documented behavior, but an aggregator workflow where every source
  soft-fails will show 100% step success in telemetry.
CI build failed: the Workflow DevKit pulled lib/safe-fetch.ts into
the workflow-function bundle, which forbids Node modules
(undici / node:net / node:async_hooks). The DevKit treats a step
file as a boundary -- only `"use step"` exports are stubbed into
the workflow-function bundle; everything else exported from the
same file gets analyzed in the workflow bundle along with its
transitive imports. Exporting `httpRequest` (which calls safeFetch)
from step.ts is what tripped the analyzer.

Moves all the worker logic (httpRequest, the timeout/failOnError
resolvers, validators, parse helpers, types, safeFetch import) into
`lib/workflow/nodes/http-request/perform.ts`. `step.ts` is reduced
to a thin `"use step"` wrapper and type re-exports. Tests import
from `./perform`.

No behavior change. `pnpm build` now succeeds locally (was failing
in CI on the same workflow-bundling error this PR introduced).
…-timeout-soft-fail

feat: KEEP-444 HTTP Request action per-node timeout + soft-fail
Both endpoints returned 404, forcing third-party agents to query the
on-chain ReputationRegistry directly to discover KH's ERC-8004 identity
and bypass the curated metadata layer.

- `/.well-known/erc8004.json` returns a card with agent_id (31875),
  chain, registry contract, reputation pointer, and links to the MCP
  and A2A cards. The underlying agent already existed (referenced in
  mcp.json route); only the well-known wrapper was missing.
- `/.well-known/agent.json` 301-redirects to the canonical
  `/.well-known/agent-card.json`. Older A2A clients probe `agent.json`
  first; the redirect avoids duplicating card content.
- Both endpoints set a 5-minute cache header so indexers (8004scan,
  Valiron) can poll without hammering the origin.
- Vitest test asserts the JSON shape, agent_id, registry address, card
  links, and the 301 status from agent.json.
Post-incident additive index migration. Adds an index on
workflow_execution_logs(started_at) to close the seq-scan path used by
the per-org gas-usage analytics query that caused the 2026-05-05 RDS
CPU incident, plus the 11 missing FK indexes Postgres does not
auto-create.

Migration uses plain CREATE INDEX IF NOT EXISTS. On dev / PR-env DBs
the tables are small / empty so drizzle-kit migrate creates them
inline without user-visible impact. On real staging / prod the same
indexes must be applied as CREATE INDEX CONCURRENTLY BEFORE merge -
plain CREATE INDEX would take an ACCESS EXCLUSIVE lock on
workflow_execution_logs (1.9 GB on prod) for the duration of the build,
which is the multi-minute lock this PR exists to avoid.

Enforcement of that pre-step is a new merge gate:

- New workflow .github/workflows/db-prep-check.yml inspects each newly-
  added drizzle/*.sql for the directive `-- @requires-db-prep` near the
  top. If present, it requires the `db-prepped-<base-branch>` label on
  the PR before reporting success.
- New labels db-prepped-staging and db-prepped-prod (green) are flipped
  by the operator after they apply the indexes to the real target DB.
- The existing branch-protection ruleset for staging / prod / main has
  db-prep-check added as a required status check, so a missing label
  blocks merge at the branch-protection level - not by runbook trust.

Indexes are declared in lib/db/schema.ts and
lib/db/schema-extensions.ts via index() helpers so future drizzle-kit
generate runs do not see them as drift.

Composite (status, started_at DESC) on workflow_executions and an FK
index on para_wallets are intentionally NOT included; see PR
description and Linear KEEP-432 for the reasoning.
perf(db): KEEP-432 add missing indexes for hot tables and FKs
…8004-404

feat(well-known): KEEP-475 add /.well-known/erc8004.json and /agent.json
@OleksandrUA OleksandrUA added the db-prepped-prod Operator applied lock-free DDL to prod DB; safe to merge label May 18, 2026
@OleksandrUA OleksandrUA merged commit 99c75ef into prod May 18, 2026
33 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

db-prepped-prod Operator applied lock-free DDL to prod DB; safe to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants