Skip to content

feat: proxy-aware transport (--use-env-proxy), backfill CLI, retry improvements#290

Merged
tonyxiao merged 18 commits intov2from
ir-readiness
Apr 15, 2026
Merged

feat: proxy-aware transport (--use-env-proxy), backfill CLI, retry improvements#290
tonyxiao merged 18 commits intov2from
ir-readiness

Conversation

@tonyxiao
Copy link
Copy Markdown
Collaborator

@tonyxiao tonyxiao commented Apr 15, 2026

Summary

IR-readiness improvements across the proxy/transport layer, a new backfill CLI, and observability enhancements.

Proxy-aware transport refactor (Node 24 --use-env-proxy)

  • Remove manual undici ProxyAgent dispatchers — Node 24's --use-env-proxy flag makes fetch respect HTTPS_PROXY/HTTP_PROXY natively, replacing the custom fetchWithProxy/withFetchProxy plumbing with a simpler tracedFetch wrapper
  • assertUseEnvProxy() guard (packages/ts-cli/src/env-proxy.ts) — throws at startup if proxy env vars are set without --use-env-proxy, preventing silent proxy bypass. Bun is exempt (respects proxy natively)
  • Docker entrypoints updated to pass --use-env-proxy to both engine and service containers
  • Signal propagationmakeApiFetch(signal) + combineSignals ensure abort signals flow through to OpenAPI spec fetches and retries (withHttpRetry now accepts signal)

mitmweb intercept tooling

  • scripts/mitmweb-env.sh — sources into a shell to route Node/Bun/curl through mitmweb; auto-installs mitmproxy if missing, auto-detects upstream proxy from env. Uses a portable _port_listening helper (ss → lsof → nc) so it works on both Linux and macOS
  • scripts/mitmweb-env.test.sh — verifies curl, Node fetch, Bun fetch, and ws WebSocket all route through the proxy
  • New mitmweb CI job runs the intercept test in parallel

Backfill CLI command

  • pipelineSyncUntilComplete (apps/engine/src/lib/backfill.ts) — loops pipeline_sync until eof:complete, carrying state forward across state_limit/time_limit boundaries
  • backfill CLI subcommand (apps/engine/src/cli/backfill.ts) — calls a remote sync engine repeatedly with state persistence to a JSON file

Retry & observability improvements

  • Retry log labelswithHttpRetry now accepts a label option (e.g. GET /v1/customers) for easier debugging; error messages included in retry output
  • Curl-style trace logstracedFetch emits reproducible curl commands when DANGEROUSLY_VERBOSE_LOGGING=true
  • Fault-tolerant getAccountCreatedTimestamp — catches errors and falls back to STRIPE_LAUNCH_TIMESTAMP instead of halting backfill
  • getAccount retry restored — fixed accidental removal of retry wrapper from an earlier commit

Housekeeping

  • Test files in ts-cli moved next to source files (co-located test pattern)

Test plan

  • All CI checks pass (lint, build, test, E2E Stripe, E2E Docker, E2E CDN, E2E Test Server, mitmweb)
  • No merge conflicts with v2 base branch

tonyxiao and others added 12 commits April 14, 2026 22:21
Add labels to retry log messages for easier debugging, include error
messages in retry output, make getAccount gracefully fall back on failure,
and format HTTP trace logs as reproducible curl commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Sources mitmweb-env.sh to route Node/Bun/curl traffic through mitmweb
on localhost:8080, auto-starting it with the internal upstream proxy if
needed. mitmweb-env.test.sh verifies all three runtimes in parallel.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Node's built-in fetch (undici) silently bypasses HTTP_PROXY/HTTPS_PROXY
without the --use-env-proxy flag. assertUseEnvProxy() throws at startup
if a proxy is configured but the flag is absent, preventing silent
proxy bypass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Fails fast if HTTPS_PROXY/HTTP_PROXY is set without --use-env-proxy,
preventing silent proxy bypass in Node's built-in fetch (undici).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Throws when HTTPS_PROXY/HTTP_PROXY is set but --use-env-proxy is absent
(Node's built-in fetch silently bypasses the proxy without it). Bun is
exempt since it always respects proxy env natively.

Subprocess tests spawn real node/bun processes to verify the flag check
works end-to-end, not just in unit logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
- Auto-install mitmproxy via pip if not found
- Auto-detect upstream proxy from http_proxy/https_proxy env vars
  instead of hardcoding the Stripe egress proxy URL; falls back to
  direct mode in CI and clean environments
- Add mitmweb proxy intercept test step to CI test job (httpbin.org)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Avoids adding latency to the main test job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Copilot AI review requested due to automatic review settings April 15, 2026 01:31
Comment thread packages/ts-cli/src/openapi/parse.ts Fixed
Comment thread packages/destination-google-sheets/src/index.ts Fixed
Comment thread apps/engine/src/api/app.test.ts Fixed
Comment thread apps/engine/src/api/app.ts Fixed
Comment thread apps/service/src/api/app.test.ts Fixed
Comment thread e2e/test-disconnect.test.ts Fixed
Comment thread e2e/test-disconnect.test.ts Fixed
Comment thread packages/destination-google-sheets/src/index.ts Fixed
Comment thread packages/destination-google-sheets/src/index.ts Fixed
Comment thread packages/destination-google-sheets/src/index.ts Fixed
Comment thread packages/source-stripe/src/index.ts Fixed
Comment thread packages/source-stripe/src/index.ts Fixed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR introduces a new apps/engine sync engine implementation (HTTP API + CLI + connector resolution), adds a new apps/service package, and adds a React dashboard app—along with Docker/CI/repo-ops updates. The scope appears substantially larger than the PR title/description suggests.

Changes:

  • Add a new streaming-based sync engine library/API/CLI in apps/engine (NDJSON streaming, connector resolver, progress/state utilities, OpenAPI schema generation + validation).
  • Add a new apps/dashboard React UI with Playwright e2e coverage and typed API clients via openapi-fetch.
  • Update repo operations: new multi-target root Dockerfile, new audit workflow, docs/agent guidance changes, and various tooling configs.

Reviewed changes

Copilot reviewed 91 out of 699 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
apps/service/src/tests/openapi.test.ts Adds OpenAPI 3.1 schema validation test for the Service spec.
apps/service/package.json Introduces the @stripe/sync-service package manifest and scripts.
apps/engine/vitest.config.ts Adds Vitest configuration (exclusions + timeouts).
apps/engine/tsconfig.json Adds engine TS build config (dist output + excludes tests).
apps/engine/src/serve-command.ts Adds a serveAction to run the engine server with connector resolution.
apps/engine/src/logger.ts Adds pino logger config with redaction settings.
apps/engine/src/lib/state-store.ts Adds state store abstraction + file/readonly implementations.
apps/engine/src/lib/source-test.ts Adds test source connector implementation for engine tests.
apps/engine/src/lib/source-exec.ts Adds subprocess-backed Source wrapper (exec-based connector).
apps/engine/src/lib/select-state-store.ts Adds dynamic destination state-store selection via optional package import.
apps/engine/src/lib/resolver.ts Adds connector resolution (specifier/bin) + resolver cache + schema extraction.
apps/engine/src/lib/resolver.test.ts Adds tests for resolver behavior and caching.
apps/engine/src/lib/remote-engine.ts Adds HTTP client implementing the Engine interface using streaming endpoints.
apps/engine/src/lib/remote-engine.test.ts Adds integration tests for the remote engine client against an in-memory server.
apps/engine/src/lib/progress.ts Adds progress tracking/enrichment and record counting for sync output.
apps/engine/src/lib/progress.test.ts Adds tests for progress enrichment and state merging.
apps/engine/src/lib/pipeline.ts Adds pipeline stages (catalog enforcement, state persistence, limits, piping).
apps/engine/src/lib/ndjson.ts Adds NDJSON parsing/serialization helpers (string/chunks/streams).
apps/engine/src/lib/ndjson.test.ts Adds tests for NDJSON parsing across chunk boundary scenarios.
apps/engine/src/lib/index.ts Re-exports engine library surface area.
apps/engine/src/lib/exec.test.ts Adds tests for exec-based connector wrappers using real connector binaries.
apps/engine/src/lib/exec-helpers.ts Adds spawn helpers for streaming/collecting subprocess NDJSON.
apps/engine/src/lib/engine.ts Adds the core in-process Engine implementation (streams everywhere).
apps/engine/src/lib/destination-test.ts Adds test destination connector implementation for engine tests.
apps/engine/src/lib/destination-filter.ts Adds catalog field selection pruning middleware for destinations.
apps/engine/src/lib/destination-filter.test.ts Adds tests for destination field-selection pruning.
apps/engine/src/lib/destination-exec.ts Adds subprocess-backed Destination wrapper (exec-based connector).
apps/engine/src/lib/default-connectors.ts Defines default in-process connectors bundled with engine.
apps/engine/src/lib/createSchemas.ts Builds typed connector config/input schemas for OpenAPI generation.
apps/engine/src/lib/backfill.ts Adds retry-until-complete helper for paging/backfill via repeated sync calls.
apps/engine/src/lib/backfill.test.ts Adds tests for backfill retry and state carry-forward.
apps/engine/src/index.ts Exposes engine library + API factory + NDJSON response helper.
apps/engine/src/cli/sync.ts Adds a sync CLI shortcut for Stripe→Postgres with optional state handling.
apps/engine/src/cli/supabase.ts Adds Supabase install/uninstall CLI integration.
apps/engine/src/cli/index.ts Adds CLI entrypoint.
apps/engine/src/cli/command.ts Adds CLI program builder + shared connector discovery flags.
apps/engine/src/cli/backfill.ts Adds backfill CLI that retries a remote engine until EOF complete.
apps/engine/src/api/openapi-utils.ts Adds helper utilities for OpenAPI doc patching/table printing.
apps/engine/src/api/index.ts Adds a runnable API server entrypoint with Node/Bun branching.
apps/engine/src/api/header-size.test.ts Adds tests validating support for large X-Pipeline header sizes.
apps/engine/src/tests/sync.test.ts Adds dockerized Postgres lifecycle test for sync & resume flow.
apps/engine/src/tests/stripe-to-postgres.test.ts Adds Stripe-mock to Postgres integration tests for selective sync and resume.
apps/engine/src/tests/openapi.test.ts Adds engine OpenAPI validation and schema/content-key assertions.
apps/engine/src/tests/docker.test.ts Adds Docker image build/run smoke tests for engine CLI/server.
apps/engine/package.json Introduces the @stripe/sync-engine package manifest and exports.
apps/dashboard/vitest.config.ts Adds dashboard Vitest config with @ alias.
apps/dashboard/vite.config.ts Adds Vite config including proxying to engine/service backends.
apps/dashboard/tsconfig.json Adds strict TS config + workspace OpenAPI type paths.
apps/dashboard/src/pages/PipelineList.tsx Adds pipeline list UI with delete action.
apps/dashboard/src/pages/PipelineDetail.tsx Adds pipeline detail UI including polling and progress display.
apps/dashboard/src/pages/PipelineCreate.tsx Adds pipeline creation wizard with connector config + stream selection.
apps/dashboard/src/main.tsx Adds React app bootstrap.
apps/dashboard/src/lib/utils.ts Adds Tailwind/clsx class merging helper.
apps/dashboard/src/lib/stream-groups.ts Adds stream grouping/filtering heuristics for UI.
apps/dashboard/src/lib/stream-groups.test.ts Adds unit tests for stream grouping/filtering heuristics.
apps/dashboard/src/lib/api.ts Adds typed engine/service API clients and discovery helper.
apps/dashboard/src/index.css Adds Tailwind import entry.
apps/dashboard/src/components/StreamSelector.tsx Adds grouped stream selector component with search and bulk select.
apps/dashboard/src/components/JsonSchemaForm.tsx Adds dynamic form renderer for JSON Schema connector configs.
apps/dashboard/src/App.tsx Adds minimal client-side routing + top-level app layout.
apps/dashboard/playwright.config.ts Adds Playwright e2e configuration using system Chrome.
apps/dashboard/package.json Adds dashboard package manifest and scripts.
apps/dashboard/index.html Adds dashboard HTML entry.
apps/dashboard/e2e/pipeline-create.test.ts Adds e2e test for full pipeline creation flow.
apps/dashboard/e2e/global-teardown.ts Adds e2e teardown that stops the engine server.
apps/dashboard/e2e/global-setup.ts Adds e2e setup that starts an engine server for tests.
apps/dashboard/Dockerfile Adds a dev Dockerfile for dashboard (pnpm + Vite).
apps/dashboard/.gitignore Adds dashboard-specific gitignore.
README.md Removes the root README content.
Dockerfile Replaces root Dockerfile with multi-target build (engine + service) using pnpm deploy.
CHANGELOG.md Adds changelog content including a 2026-04-11 entry.
AGENTS.md Rewrites agent instructions and repo conventions.
.vscode/settings.json Updates VS Code workspace settings (Deno path + Ruby LSP + explorer sort).
.vscode/extensions.json Adds recommended VS Code extension list.
.verdaccio/config.yaml Adds Verdaccio config for local/private registry workflows.
.ruby-version Adds pinned Ruby version file.
.releaserc.json Removes semantic-release config.
.prettierignore Expands ignore list (generated outputs, terraform, etc.).
.npmrc Adds configurable @stripe registry via env var.
.github/workflows/release.yml Changes release workflow to “promote SHA” flow (manual fallback).
.github/workflows/docs.yml Removes docs publish workflow.
.github/workflows/audit.yml Adds a scheduled/manual repo audit workflow.
.githooks/pre-push Adds pre-push hook to validate generated OpenAPI specs are up to date.
.eslintrc.js Removes root ESLint config file.
.dockerignore Updates dockerignore (no longer ignores dist).
Files not reviewed (2)
  • apps/engine/src/generated/openapi.json: Language not supported
  • apps/service/src/generated/openapi.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Dockerfile
Comment on lines +1 to +3
# Monorepo Dockerfile — build individual images with --target:
# docker build --target engine . → @stripe/sync-engine (stateless HTTP API)
# docker build --target service . → @stripe/sync-service (serve + worker CLI)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description focuses on mitmweb proxy intercept tooling and a new CI job, but the diff is dominated by adding new apps (apps/engine, apps/service, apps/dashboard), Docker/CI/tooling changes, and numerous tests. Either the PR description needs to be updated to match the actual scope, or the unrelated changes should be split into separate PRs to keep reviewable boundaries and reduce merge risk.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +17
const resolver = createConnectorResolver({
sources: { stripe: sourceStripe },
destinations: { postgres: destinationPostgres, google_sheets: destinationGoogleSheets },
})

const app = createApp(resolver)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createConnectorResolver and createApp are async in this codebase (used with await elsewhere). Here they are called without await, so resolver/app will be Promises at runtime and serve({ fetch: app.fetch, ... }) will fail. Fix by awaiting both calls (and ensure any subsequent uses handle the resolved values).

Suggested change
const resolver = createConnectorResolver({
sources: { stripe: sourceStripe },
destinations: { postgres: destinationPostgres, google_sheets: destinationGoogleSheets },
})
const app = createApp(resolver)
const resolver = await createConnectorResolver({
sources: { stripe: sourceStripe },
destinations: { postgres: destinationPostgres, google_sheets: destinationGoogleSheets },
})
const app = await createApp(resolver)

Copilot uses AI. Check for mistakes.
Comment on lines +119 to +126
const engine = createEngine(makeResolver())
const catalogMsg = await collectFirst(
engine.source_discover({
type: 'stripe',
stripe: { api_key: 'sk_test_fake', base_url: STRIPE_MOCK_URL },
}),
'catalog'
)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createEngine is declared as async and returns Promise<Engine>, but this test calls it without await and then immediately calls engine.source_discover(...). This will throw at runtime because engine is a Promise. Update all occurrences in this file to const engine = await createEngine(...).

Copilot uses AI. Check for mistakes.
}

// Scan PATH directories
const pathDirs = (process.env['PATH'] ?? '').split(':')
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting PATH with ':' is not portable (Windows uses ;). Use path.delimiter from node:path (or an equivalent helper) when splitting PATH to keep connector discovery cross-platform.

Copilot uses AI. Check for mistakes.
Comment on lines +122 to +124
export function splitCmd(cmd: string): [string, string[]] {
const parts = cmd.trim().split(/\s+/)
const [bin = cmd, ...baseArgs] = parts
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whitespace split does not respect shell quoting/escaping, so commands like node -e \"console.log('a b')\" or paths with spaces will be split incorrectly. Consider either (1) storing command + args as an array throughout the resolver/commandMap path, or (2) using a small shell-quote parser to correctly split strings when you must accept a single command string.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +11
const output = await validate(
'https://spec.openapis.org/oas/3.1/schema-base',
serviceSpec as unknown as Json
)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test depends on fetching the OpenAPI 3.1 schema from the public internet at runtime. That can make CI flaky or fail in restricted environments. Prefer vendoring/pinning the schema locally (or using a resolver/cache supported by the validator) so tests are deterministic and do not require external network access.

Suggested change
const output = await validate(
'https://spec.openapis.org/oas/3.1/schema-base',
serviceSpec as unknown as Json
)
const output = await validate(serviceSpec as unknown as Json)

Copilot uses AI. Check for mistakes.
Comment thread .dockerignore
db
test
dist
node_modules
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dockerignore no longer excludes dist/, which can significantly inflate the Docker build context and slow down builds (especially in CI). If dist/ is generated locally, it should typically be excluded from the build context unless there's a specific need to include it.

Suggested change
node_modules
node_modules
dist

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +46
// Pipeline config and connector state are passed via the X-Pipeline header.
// Node.js defaults to 16 KB which caps state at ~250 entries — too small for
// connectors like google-sheets that carry row mappings. 50 MB is a conservative
// ceiling; typical headers remain small. See docs/engine/header-size-limits.md
serverOptions: { maxHeaderSize: 50 * 1024 * 1024 },
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting maxHeaderSize to 50 MB materially increases memory usage per connection and can amplify DoS risk (large headers are buffered/parsed before app handling). If large payloads are required, consider moving pipeline/state off headers (e.g., request body for state and an ID reference header), or at minimum make this limit configurable via env with a smaller default and document required reverse-proxy limits.

Copilot uses AI. Check for mistakes.
tonyxiao and others added 4 commits April 15, 2026 01:44
Fixes CI failure in the test job where bun isn't available yet.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Tests that ws connections route through mitmweb via explicit
HttpsProxyAgent (ws ignores HTTP_PROXY and --use-env-proxy).
CI mitmweb job now installs pnpm deps so ws/https-proxy-agent resolve.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Node 24 --use-env-proxy makes undici/fetch respect HTTPS_PROXY natively,
so the manual ProxyAgent dispatcher injection in fetchWithProxy is
redundant. Replace all call sites with tracedFetch (curl trace logging
only, no proxy logic) and remove withFetchProxy/fetchWithProxy entirely.

getHttpsProxyAgentForTarget is kept — ws does not respect --use-env-proxy
and still needs an explicit HttpsProxyAgent for WebSocket connections.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
- Restore getAccount to use requestWithRetry (e0fb332 unintentionally
  removed retry by switching to request directly)
- Update test that expected getAccount failure to halt backfill: since
  getAccountCreatedTimestamp now swallows errors (fault-tolerant), backfill
  proceeds with STRIPE_LAUNCH_TIMESTAMP fallback instead of failing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
@tonyxiao tonyxiao changed the base branch from main to v2 April 15, 2026 03:00
…at startup

- Add --use-env-proxy flag to both engine and service Docker ENTRYPOINT so
  fetch/undici respects HTTPS_PROXY env var in container deployments
- Call assertUseEnvProxy() at startup in both CLI entry points so proxy
  misconfiguration (proxy set without --use-env-proxy) is caught immediately
  with a clear error rather than silently bypassing the proxy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
tonyxiao pushed a commit that referenced this pull request Apr 15, 2026
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
@tonyxiao tonyxiao changed the title IR readiness: mitmweb proxy intercept tooling feat: proxy-aware transport (Node 24 --use-env-proxy), backfill CLI, retry logging improvements Apr 15, 2026
- Resolve merge conflicts with v2 (combine retry label + signal, use
  tracedFetch with combineSignals/makeApiFetch from v2)
- Replace ss -tlnp (Linux-only) with a portable _port_listening helper
  that tries ss → lsof → nc, so the script works on macOS too
- Add TODO for logging in getAccountCreatedTimestamp silent catch

Made-with: Cursor
Committed-By-Agent: cursor
@tonyxiao tonyxiao changed the title feat: proxy-aware transport (Node 24 --use-env-proxy), backfill CLI, retry logging improvements feat: proxy-aware transport (--use-env-proxy), backfill CLI, retry improvements Apr 15, 2026
@tonyxiao tonyxiao merged commit 3f5db91 into v2 Apr 15, 2026
20 checks passed
@tonyxiao tonyxiao deleted the ir-readiness branch April 15, 2026 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants