Skip to content

Framework hardening + test-reliability overhaul#25

Merged
isala404 merged 7 commits into
mainfrom
bug-fixes
May 30, 2026
Merged

Framework hardening + test-reliability overhaul#25
isala404 merged 7 commits into
mainfrom
bug-fixes

Conversation

@isala404
Copy link
Copy Markdown
Owner

Summary

This branch hardens the framework against the security/quality audit (ISSUES.md) and then overhauls test reliability so the integration suite becomes a trustworthy CI gate — the point being that we can make big changes confidently.

The headline of the recent work: forge-runtime's own integration suite had never run in CI, so it silently rotted — 14 tests were red against shipped code. Making it CI-ready surfaced (and fixed) several real bugs.

Real bugs fixed (caught by un-rotting tests)

  • Sessions persisted under the wrong id (signals): upsert_session minted a fresh UUID for a new session even when the handler had already returned a session_id to the client, so every later event missed its session and spawned an orphan — session continuity was silently broken. The E2E only checked the response payload, so it never caught it.
  • Cron runs never completed: the scheduler only ever wrote status='running'; nothing finalized a successful run, so forge_cron_runs filled with stuck rows and the only exit was the 15-min stale-reclaim.
  • Leader election couldn't evict its own zombie: the pool set application_name to the project name, but the zombie-preemption guard only terminates forge-prefixed backends.
  • Sub-second job-retry backoff truncated to 0: num_seconds() before the float cast dropped the backoff for the common first retry (1s − jitter ≈ 0.75s), retrying instantly.

Test-reliability overhaul

  • Removed ~150 false-confidence / tautological tests (mock-self-tests where ctx.http() never consults the mock, no-assertion constructor smokes, dead Step* API + its tests, builder round-trips) and the unwired email feature.
  • Added fast pure-logic unit tests: transient-error classifier, config validation branches, tenant filter, env-var substitution, metrics path/label normalization.
  • Resurrected 14 rotted forge-runtime integration tests (jobs queue routing, change_log schema + retention-floor, migration error-chain, signals roundtrip).
  • Grew forge-harness (the real, full-stack integration layer): job retry-then-succeed, dead-letter, and transactional rollback of data + outbox job.
  • Installed the CI gate (workspace-integration): runs the forge-runtime integration suite with --features "full,testcontainers" and --test-threads=1. Two non-obvious requirements, both learned by validation: every subsystem is feature-gated (bare testcontainers is a silent no-op), and the suite uses PG instance-global state (advisory locks, pg_terminate_backend, leader election) that can't run concurrently against one database. Serial = 642 pass / 0 fail / 0 ignored (~68s).

Verified

  • cargo clippy --all-targets --all-features --workspace -- -D warnings clean
  • cargo fmt --all --check clean
  • cargo test --workspace (SQLX_OFFLINE) green
  • Full forge-runtime integration suite green serially against a shared Postgres (642/0/0)
  • Demo users.rs, harness retry/dead-letter/tx-rollback, and the runtime cron/leader/pool regressions validated under testcontainers

Still open (tracked in TEST-AUDIT.md)

  • More forge-harness coverage (compensation rollback, durable-sleep resume, cron, and the security boundaries: cross-user Forbidden, negative SSE auth, 429)
  • Phase 5 E2E de-duplication and a Phase 6 cargo llvm-cov coverage gate

isala404 added 7 commits May 24, 2026 00:43
Tighten config loading and validation, error classification, SQL table extraction, parser handling of serde-skip fields, and type emission. Remove the unwired email feature and its test mocks.
Auth/session handling (SSE tickets, session revocation, refresh cookie), webhook rate limiting, multipart limits, cron timezone and catch-up, job retry backoff precision and dead-lettering, workflow resume, leader-election zombie eviction, reactor/subscription reliability, and signals session persistence.
Include lockfiles in the template archive and preserve them on new, resolve the forge crate binding via proc-macro-crate, and tighten check/migrate/test/new command paths.
Re-register job/workflow subscriptions and reconnect SSE on auth change, settle the session before subscribing, set the connected-token hash before resolving connect, native SSE jitter, and web-vitals beacon as a JSON blob.
Per-user auth in realtime-todo-list, auth-gated demo panels, idempotent seed migrations, regenerated .sqlx for auth-scoped queries, plain-reqwest webhook loopback, and tuned Playwright timeouts across all six templates.
Resurrect rotted job-queue and change-log tests, cover transactional rollback, job retry/dead-letter, auth/session flows, and per-file feature-gate sentinels.
Pin third-party action and cargo-deny/cargo-audit versions, gate the forge-runtime integration suite, branch-protection preflight, regenerate the workspace .sqlx cache, and update admin-api/security/configuration docs.
@isala404 isala404 force-pushed the bug-fixes branch 3 times, most recently from eeec2fe to 3f39d86 Compare May 30, 2026 10:49
@isala404 isala404 merged commit 2d4baa2 into main May 30, 2026
16 checks passed
@isala404 isala404 deleted the bug-fixes branch May 30, 2026 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant