release: To Prod by suisuss · Pull Request #1410 · KeeperHub/keeperhub

suisuss · 2026-05-29T09:17:29Z

No description provided.

…rows KEEP-597. The sidebar workflow picker rendered soft-deleted rows with a strikethrough and "Deleted" badge instead of hiding them, and gave no visual signal for workflows with `enabled = false`. Soft-deleted rows are now filtered out at the sidebar boundary via a new `filterPickerVisible` helper in lib/workflow/soft-delete.ts. The wider /api/workflows payload still returns deleted rows so audit and history surfaces keep working (matches the KEEP-440 contract). Disabled rows stay visible and selectable but render with muted-foreground text and a "Disabled" badge -- no strikethrough. The dead deleted-render branch in WorkflowItem is removed; the picker contract is now centralised in the helper.

Hide the "Report an issue" feedback button in the navigation sidebar and short-circuit POST /api/feedback so it no longer proxies to the external feedback service. Both gate on NEXT_PUBLIC_FEEDBACK_ENABLED, which defaults to off; set it to "true" to re-enable without code changes. The ERC-8004 agent-reputation feedback system (/api/feedback/[id], agentic-wallet/feedback, schema-feedback) is intentionally untouched -- it is unrelated infrastructure that happens to share the "feedback" name. Adds unit coverage proving the route returns 503 without calling the external service when disabled (including non-"true" flag values), and resumes proxying when re-enabled.

Reverts #1370

…able Refines KEEP-597. The first cut showed the "Disabled" badge on every workflow with enabled = false, but the workflows table defaults enabled to false and the UI only exposes the toggle for Schedule / Webhook / Event / Block triggers (shouldShowEnableSwitch). That left every newly created Manual workflow incorrectly tagged "Disabled" with no way for the user to flip it off. Adds two small helpers to lib/workflow/store.ts: getWorkflowTriggerType(nodes) -- pulls the trigger type off the trigger node without dragging the editor store into view. shouldShowDisabledBadge({enabled, triggerType}) -- the picker predicate: enabled === false AND the trigger has an enable switch. The sidebar derives triggerType once at the visibleWorkflows boundary and feeds it into WorkflowItem; the badge gates on the new predicate. Manual rows now render clean regardless of the enabled column.

…ow check The sidebar caches the workflow list in local state and derives the "Disabled" label from each row's `enabled` column. Flipping the enable toggle in the editor header updated only the editor's atom -- the sidebar kept the stale value until a full refetch. Call refetchSidebar() inside updateWorkflowEnabled so the picker repaints right away. While here, drop the small Check icon on the row matching the currently viewed workflow. The bg-muted highlight on the active row already carries that signal and the icon is more noise than help.

…letedAt guard Two review nits from the PR review pass: - getWorkflowTriggerType now folds the legacy "Scheduled" triggerType onto "Schedule" before returning. Three other call sites (executor, metrics, mcp) already do this normalization, which implies real rows in the DB still hold the old spelling -- without normalization here the picker silently mis-classifies them and skips the Disabled badge. - filterPickerVisible now compares deletedAt against null/undefined explicitly (== null) instead of using a truthy check. The Date | string arms of the type would otherwise treat the epoch-zero Date or an empty-string deletedAt as not-deleted. Theoretical for our current API shape, but matches the documented intent of the type signature. Adds one test for the Scheduled-legacy path.

…or-ux-polish chore: KEEP-493 workflow editor UX polish (PR #1370)

chore: disable feedback widget behind a feature flag

…ker-deleted-disabled fix(workflow-picker): hide soft-deleted workflows; label only schedulable disabled

The scrape path runs ~35 aggregate queries against workflow_executions and workflow_execution_logs every 30s per pod. With replicaCount=2 and no caching, the DB sees ~4 full rounds per 2 minutes; staging profiling showed getBillingStatsFromDb averaging 5.9s, getWorkflowStatsFromDb 1.1s (Parallel Seq Scan), and getStepStatsFromDb 0.4s. Wrap updateDbMetrics in a per-process TTL cache (default 60s, overridable via DB_METRICS_CACHE_TTL_MS). Concurrent callers during an in-flight refresh share the same promise; failed refreshes are evicted so the next scrape retries instead of pinning a poisoned entry for the full TTL. The public boundary still swallows errors, preserving the existing "metrics endpoint never 500s" behavior. Tests added in tests/unit/db-metrics-cache.test.ts cover: TTL window hit, TTL expiry, TTL=0 bypass, in-flight dedup, and eviction-on-rejection.

If a refresh runs longer than DB_METRICS_CACHE_TTL_MS (e.g. DB under stress with multi-second seq scans), the old cache check used the start time and would let the next scrape begin a second concurrent refresh, doubling DB load in the exact moment the cache was supposed to protect against it. Track startedAt and completedAt separately. An in-flight entry (completedAt === null) is always shared regardless of TTL; only once the refresh completes does the TTL clock start, and the next refresh fires when (now - completedAt) >= ttl. Use Promise#then's two-arg form so the success-stamp / failure-evict handlers don't fork an orphan rejected promise that surfaces as an unhandled rejection. Adds a test covering the in-flight-past-TTL case.

Two new counters on the apiRegistry (per pod, scraped via /api/metrics/api): - keeperhub_db_metrics_cache_lookups_total{result} result = hit | miss | in_flight_dedup | disabled Increments once per updateDbMetrics() call. Cache hit rate is hit / (hit + miss + in_flight_dedup). - keeperhub_db_metrics_refresh_total{outcome} outcome = success | error Increments once per actual DB round-trip. DB load contribution is sum(rate(refresh_total)); failure rate is rate(refresh_total{outcome="error"}) / rate(refresh_total). Without these, the only post-deploy signal for cache effectiveness was the indirect "DB CPU went down" reading on CloudWatch.

parseInt silently strips trailing non-numeric characters, so DB_METRICS_CACHE_TTL_MS=60s would be parsed as 60 (i.e. 60ms, not 60s) without warning. Use Number() which returns NaN for any trailing junk, so malformed env vars fall back to the documented default instead of silently degrading. Adds a test case covering the '60s' typo.

Date.now() is wall-clock and can step backwards on NTP adjustments (rare on EKS, not impossible). A backwards step would make the TTL check artificially small, holding a stale cache entry longer than intended. performance.now() is process-monotonic and the correct primitive for an interval check. Tests updated to use vi.useFakeTimers({ toFake: ['Date', 'performance'] }) so both clocks advance together with vi.advanceTimersByTime().

Exporting a cache-reset function from production code is a smell — even with the __ prefix, any importer can call it. Vitest sets NODE_ENV=test by default, so gate the call so production accidents fail loud (throw) instead of silently clearing mid-flight cache state. Keeps the test ergonomics (no need for vi.resetModules + re-imports per test) while removing the attractive nuisance.

mockClear only resets call history; mockImplementationOnce and mockRejectedValueOnce queued items persist across tests. If a test queues an implementation it doesn't fully consume, the next test inherits it. mockReset clears the implementation queue too. Default impls are factored into DEFAULT_DB_RETURNS so the per-test rebind is a single call.

flushMicrotasks(20) was a magic number chosen empirically because refreshDbMetricsNow() does a dynamic import + Promise.all of 13 helpers. Add a helper, the count silently changes; the test could keep passing for the wrong reason. waitForMockCall polls each microtask turn until the mock has been invoked at least once, then returns. Misuse fails loud with a clear error instead of a confusing downstream assertion failure.

…on test assertion The fire-and-forget raw.then() in updateDbMetrics had no protection against the settle callbacks themselves throwing (e.g. prom-client in a bad state), which would produce an unhandled rejection from the returned promise. Added .catch() to cover that path. Tightened the eviction test assertion from toBeGreaterThanOrEqual(2) to toHaveBeenCalledTimes(2) — the precise claim is that a failed refresh evicts the slot so the next call makes exactly one retry, not an unbounded number.

perf: cache /api/metrics/db scrape to cut DB CPU

…r db metrics

perf: reconcile incident indexes and add covering indexes for db metrics

eskp and others added 9 commits May 29, 2026 17:57

revert: KEEP-493 undo workflow editor UX polish (PR #1370)

fe4d1c5

Reverts #1370

Merge pull request #1407 from KeeperHub/revert/KEEP-493-workflow-edit…

765a606

…or-ux-polish chore: KEEP-493 workflow editor UX polish (PR #1370)

Merge pull request #1408 from KeeperHub/feature/disable-feedback-widget

d897565

chore: disable feedback widget behind a feature flag

Merge pull request #1409 from KeeperHub/feature/keep-597-workflow-pic…

89cc8cd

…ker-deleted-disabled fix(workflow-picker): hide soft-deleted workflows; label only schedulable disabled

suisuss temporarily deployed to staging May 29, 2026 09:17 — with GitHub Actions Inactive

eskp temporarily deployed to staging May 29, 2026 09:23 — with GitHub Actions Inactive

eskp temporarily deployed to staging May 29, 2026 09:34 — with GitHub Actions Inactive

suisuss added 9 commits May 29, 2026 19:46

Merge pull request #1411 from KeeperHub/perf/KEEP-669-db-metrics-cpu

5751a2b

perf: cache /api/metrics/db scrape to cut DB CPU

suisuss temporarily deployed to staging May 29, 2026 10:27 — with GitHub Actions Inactive

suisuss temporarily deployed to staging May 29, 2026 10:28 — with GitHub Actions Inactive

suisuss temporarily deployed to staging May 29, 2026 10:40 — with GitHub Actions Inactive

perf: KEEP-669 reconcile incident indexes and add covering indexes fo…

e368042

…r db metrics

Merge pull request #1412 from KeeperHub/perf/KEEP-669-reconcile-indexes

e7ebb70

perf: reconcile incident indexes and add covering indexes for db metrics

suisuss temporarily deployed to staging May 29, 2026 11:03 — with GitHub Actions Inactive

suisuss added the db-prepped-prod Operator applied lock-free DDL to prod DB; safe to merge label May 29, 2026

suisuss temporarily deployed to staging May 29, 2026 11:10 — with GitHub Actions Inactive

suisuss temporarily deployed to staging May 29, 2026 11:15 — with GitHub Actions Inactive

OleksandrUA merged commit 5d3cbdb into prod May 29, 2026
43 of 44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: To Prod#1410

release: To Prod#1410
OleksandrUA merged 21 commits into
prodfrom
staging

suisuss commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

suisuss commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants