Skip to content

Add Sourcepoint first-party CMP integration#625

Open
ChristianPavilonis wants to merge 20 commits intomainfrom
feature/sourcepoint-integration
Open

Add Sourcepoint first-party CMP integration#625
ChristianPavilonis wants to merge 20 commits intomainfrom
feature/sourcepoint-integration

Conversation

@ChristianPavilonis
Copy link
Copy Markdown
Collaborator

@ChristianPavilonis ChristianPavilonis commented Apr 8, 2026

Summary

  • Adds a new Sourcepoint CMP (Consent Management Platform) first-party proxy integration that routes cdn.privacy-mgmt.com and geo.privacymanager.io traffic through Trusted Server, eliminating third-party requests for consent management assets.
  • Implements four rewriting layers (HTML attributes, JS response bodies, runtime config trap, client-side DOM guard) to ensure comprehensive URL coverage across static markup, dynamically loaded scripts, and Next.js hydration payloads.
  • Ships disabled by default with full TOML configuration, integration documentation, 14 Rust unit tests, and 6 TypeScript tests.

Changes

File Change
crates/trusted-server-core/src/integrations/sourcepoint.rs New ~860-line integration module: route registration, CDN/geo proxy handlers, HTML attribute rewriter, JS body rewriter (rewrite_script_content), head injector (window._sp_ property trap), Accept-Encoding scoping, backend selection, and 14 unit tests
crates/trusted-server-core/src/integrations/mod.rs Register sourcepoint module and wire it into the integration registry
crates/js/lib/src/integrations/sourcepoint/index.ts TypeScript entry point that bootstraps the client-side script guard
crates/js/lib/src/integrations/sourcepoint/script_guard.ts Client-side DOM insertion dispatcher handler that intercepts dynamically inserted <script>/<link> elements pointing at Sourcepoint CDN and rewrites them to first-party paths
crates/js/lib/test/integrations/sourcepoint/script_guard.test.ts 6 Vitest tests covering the script guard (rewrites, skips, and teardown)
docs/guide/integrations/sourcepoint.md Integration documentation: overview, setup steps, config reference, architecture
docs/guide/integrations-overview.md Updated overview with Sourcepoint in the Mermaid flowchart, performance characteristics table, and environment variables section
trusted-server.toml Added [integrations.sourcepoint] config stanza (disabled by default)

Closes

Closes #145
Closes #344
Closes #345

Test plan

  • cargo test --workspace
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format
  • WASM build: cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1
  • Manual testing via fastly compute serve

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses log macros (not println!)
  • New code has tests
  • No secrets or credentials committed

@ChristianPavilonis ChristianPavilonis self-assigned this Apr 8, 2026
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Fixed
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Fixed
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Fixed
@aram356 aram356 requested review from aram356 and prk-Jr April 9, 2026 16:36
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Adds a well-structured Sourcepoint CMP first-party proxy with four rewriting layers (HTML attributes, JS body, runtime config trap, client-side DOM guard). The implementation follows existing integration patterns closely, has 14 Rust + 6 JS tests, and ships disabled by default. A few items need attention before merge.

Blocking

🔧 wrench

  • take_body_str() panics on non-UTF-8 upstream responses: response.take_body_str() at line 400 will panic if the upstream CDN returns invalid UTF-8 with a JS Content-Type. Read bytes and attempt conversion with a fallback instead. (sourcepoint.rs:400)
  • CI failure — format-typescript: Prettier check is failing on test/integrations/sourcepoint/script_guard.test.ts. Must be fixed before merge.

❓ question

  • Redirect Location headers not rewritten: The JS body rewrite is gated on status == 200. If the CDN returns a 3xx redirect with a Location pointing to cdn.privacy-mgmt.com, the browser follows it back to the third-party CDN, defeating the proxy. (sourcepoint.rs:394)

Non-blocking

🤔 thinking

  • Upstream response headers forwarded verbatim: Set-Cookie and other headers from cdn.privacy-mgmt.com pass through to the client on the non-rewritten path. Same as Didomi, but worth confirming this is intended. (sourcepoint.rs:418)
  • Regex matches mismatched quotes: The CDN URL pattern can match 'url" (single open, double close). Extremely unlikely in minified JS but could produce malformed output. (sourcepoint.rs:65)

♻️ refactor

  • Redundant rewrite_sdk guard: Checked in both handles_attribute() and rewrite() — the second is unreachable. (sourcepoint.rs:437)
  • Head injector config property names: Hardcoded Sourcepoint config keys could go stale. A test asserting known property names appear in the generated script would catch omissions. (sourcepoint.rs:467)

🌱 seedling

  • No body size limit: The body is read entirely into memory with no upper bound. A Content-Length check would prevent OOM from unexpected large responses. (sourcepoint.rs:400)
  • Missing validation test for invalid cdn_origin: No test proves that register() fails when cdn_origin is not a valid URL. (sourcepoint.rs:153)

⛏ nitpick

  • Manual [sourcepoint] log prefix: Other integrations rely on the log crate's module path rather than a manual bracketed prefix. (sourcepoint.rs:355)

CI Status

  • cargo fmt: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: FAIL (Prettier issue in script_guard.test.ts)
  • format-docs: PASS
  • browser integration tests: PASS
  • integration tests: PASS
  • CodeQL: FAIL (likely infra)

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed commit a211eb0. Three blocking issues require fixes before merge; the rest are non-blocking.


Blocking (must fix before merge)

B1 — CI: format-typescript still failing on current HEAD

The latest commit (a211eb02) re-introduced a Prettier violation in crates/js/lib/test/integrations/sourcepoint/script_guard.test.ts. CI's format-typescript check will fail on this commit. Fix with:

cd crates/js/lib && npm run format -- --write

then commit the result.

B2 — SSRF via cdn_origin (inline comment on line 96)

See inline comment. A syntactically valid but host-unrestricted cdn_origin allows proxying to any IP including 169.254.169.254 (cloud metadata). Needs a custom host validator.

B3 — PUT/PATCH unreachable (inline comment on line 337)

routes() only registers GET and POST; the PUT/PATCH arms in handle() are dead code. Either register the methods or remove the arms.


Important (non-blocking)

I1 — Single-quoted '/unified/' missed by SP_ORIGIN_UNIFIED_PATTERN (inline on line 81) — fix is a one-character change to the regex character class.

I2 — BackendConfig::from_url per request (inline on line 357) — pre-compute in new() following the pattern in aps.rs / prebid.rs.

I3 — PR checklist says "Uses tracing macros" but codebase uses log crate

The checklist item reads "Uses tracing macros (not println!)" but CLAUDE.md specifies log. The source correctly uses log::info! throughout. The checklist entry should read "Uses log macros (not println!)" to avoid confusion for future contributors.

I4 — JS-rewrite path drops all upstream headers (inline on line 403) — Response::new() discards Vary, CORS, and security headers. Start from response.clone_without_body() instead.

I5 — No test for build_target_url with a path-bearing cdn_origin (inline on line 337) — set_path silently drops any path prefix in cdn_origin. Should be tested and either fixed or documented as an explicit constraint.


Suggestions (non-blocking)

S1 — URL normalisation logic duplicated across Rust and TypeScript

parse_sourcepoint_url in sourcepoint.rs and normalizeSourcepointUrl in script_guard.ts implement identical protocol-relative URL normalisation in parallel. Not a defect, but a cross-reference comment in each file would prevent future fixes being applied to only one side.

S2 — Redundant rewrite_sdk guard in rewrite() (inline on line 437) — unreachable because handles_attribute already gates on it. Either remove with a comment or document the intentional belt-and-suspenders.

S3 — No test for single-quoted CDN URLs (inline on line 81)

S4 — Unchecked WASM build in PR checklist

The test plan has an unchecked - [ ] WASM build item, but CI's WASM build passes. The checkbox should be checked to accurately reflect build status.

S5 — register missing # Examples doc section (inline on line 314) — CLAUDE.md requires this for all public API functions.

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
ChristianPavilonis and others added 3 commits April 10, 2026 10:48
- Replace #[validate(url)] with a custom validator that restricts
  cdn_origin to *.privacy-mgmt.com hosts, preventing SSRF via
  arbitrary origins (e.g. cloud metadata endpoints).
- Remove unreachable PUT/PATCH arms from the request body match
  since routes() only registers GET and POST.
- Fix Prettier formatting in script_guard.test.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace take_body_str() with take_body_bytes() + String::from_utf8()
  to avoid panicking on non-UTF-8 upstream responses.
- Rewrite Location headers on 3xx redirects that point to
  cdn.privacy-mgmt.com so browsers stay on the first-party proxy.
- Preserve upstream CORS headers on the JS-rewrite path instead of
  discarding them when building a fresh Response.
- Extend SP_ORIGIN_UNIFIED_PATTERN to match both single- and
  double-quoted "/unified/" chunk paths, preserving the original
  quote character in the replacement.
- Normalise log prefixes from [sourcepoint] to Sourcepoint: for
  consistency with APS/Prebid style.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove redundant rewrite_sdk check from rewrite() since
  handles_attribute() already gates on it; update test to verify
  the guard at the handles_attribute level.
- Add # Examples section to register() per documentation standards.
- Add tests for cdn_origin validation (rejects non-privacy-mgmt.com
  hosts, accepts valid origins).
- Add test for single-quoted origin+'/unified/' rewrite pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ChristianPavilonis
Copy link
Copy Markdown
Collaborator Author

Review feedback addressed

All blocking and important findings have been fixed across three commits. Replies posted on each inline thread.

Blocking (fixed)

  • SSRF guard — Custom validate_cdn_origin() restricts host to *.privacy-mgmt.com
  • PUT/PATCH dead code — Removed unreachable arms
  • Prettier — Formatting fixed

Important non-blocking (fixed)

  • UTF-8 safetytake_body_bytes() + String::from_utf8() with graceful fallback
  • Redirect rewriting — 3xx Location headers containing CDN host are rewritten
  • CORS headers — JS-rewrite path now preserves upstream CORS headers
  • Single-quote regexSP_ORIGIN_UNIFIED_PATTERN handles both ' and " quotes
  • Log prefix — Normalised to Sourcepoint: style

Suggestions (fixed)

  • Removed redundant rewrite_sdk guard, added invariant comment
  • Added # Examples to register() docs
  • Added validation tests for cdn_origin
  • Added test for single-quoted unified path

Intentionally skipped (with rationale in thread replies)

  • Regex mismatched quotes — documented limitation, no real-world impact
  • BackendConfig::from_url per-request — matches APS/Prebid pattern
  • Body size limit — future enhancement
  • Config property name test — covered by existing head injector test

All CI checks pass locally: cargo fmt, clippy, cargo test, vitest, Prettier.

Refactor normalizeSourcepointUrl to remove the bare-domain startsWith
check that triggered CodeQL "Incomplete URL substring sanitization"
alerts. The host === exact match was already the security boundary;
now the normalization layer no longer references the CDN hostname at
all, eliminating the static analysis finding.

Add a Content-Length guard (5 MB) before reading upstream response
bodies into memory for JavaScript rewriting, preventing unbounded
memory consumption from unexpectedly large responses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aram356 aram356 linked an issue Apr 13, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Well-executed revision — every prior blocker is addressed and the four-layer rewriting architecture (HTML attrs / JS body / window._sp_ trap / client DOM guard) is cleanly separated. Two remaining correctness issues in the upstream response handling need fixes before merge: the body-size guard is bypassed when Content-Length is absent, and the UTF-8 fallback path silently drops upstream headers. A few non-blocking refinements around redirect rewriting, config scope, and test coverage follow.

Blocking

🔧 wrench

  • Body-size guard bypassed on chunked/unknown-length responsesMAX_REWRITE_BODY_SIZE only applies when Content-Length is present; otherwise take_body_bytes() reads unbounded (sourcepoint.rs:456-470)
  • UTF-8 fallback drops upstream headers — the fallback constructs a fresh Response with only status + Content-Type, discarding Set-Cookie, CORS, Cache-Control, Vary, etc., diverging from the passthrough path (sourcepoint.rs:480-488)

Non-blocking

♻️ refactor

  • Redirect Location rewrite is a literal String::replace — misses protocol-relative //cdn.privacy-mgmt.com/... values (sourcepoint.rs:430-438)
  • validate_cdn_origin does not check schemeftp://cdn.privacy-mgmt.com passes validation and only fails at the first proxied request (sourcepoint.rs:135-150)

🤔 thinking

  • Config/rewriter host scope mismatchvalidate_cdn_origin accepts *.privacy-mgmt.com but rewriters hardcode cdn.privacy-mgmt.com; configuring a different subdomain silently half-enables the integration (sourcepoint.rs:143, 69, 197, 598)
  • Rewritten-JS path hard-codes Cache-Control — diverges from the passthrough path's apply_cache_headers policy without explanation (sourcepoint.rs:498-501)
  • Property trap only catches reassignment of window._sp_ — nested mutation (window._sp_.config.X = ...) is not patched; also s.replace() in the JS helper is first-occurrence only (sourcepoint.rs:568-600)

🌱 seedling

  • copy_headers forwards Authorization — credential-leak footgun if publishers reuse bearer tokens across origins (sourcepoint.rs:222-233)
  • No direct test for handle() — redirect rewriting, UTF-8 fallback, Content-Length guard, Content-Type gating all lack unit coverage (sourcepoint.rs:372-522)

👍 praise

  • Solid response to the prior review — every prior 🔧/❓ addressed with well-chosen fixes; four-layer rewriting architecture is cleanly separated and well documented

CI Status

  • cargo fmt: PASS
  • cargo clippy: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • browser integration tests: PASS
  • CodeQL: PASS

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
…t rewrite

- Skip JS rewrite when Content-Length is absent (chunked/unknown) to
  prevent unbounded memory reads
- Reuse original response on UTF-8 fallback to preserve upstream headers
- Extract redirect Location rewriting into a URL-parsing helper that
  handles absolute, protocol-relative, and relative redirects
- Tighten cdn_origin validation to exactly cdn.privacy-mgmt.com and
  reject non-HTTP(S) schemes
- Drop Authorization from forwarded headers to prevent credential leakage
- Document Cache-Control divergence and property trap limitations
- Add 7 new tests for validation and redirect rewriting
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Round-3 blockers are all correctly addressed — UTF-8 fallback now reuses the original response (preserving headers), Content-Length-missing short-circuits the rewrite before take_body_bytes, rewrite_redirect_location handles absolute / protocol-relative / relative via Url::join, validate_cdn_origin checks scheme and pins the host, and Authorization is excluded from forwarded headers. No remaining 🔧/❓ findings. Submitting as change-request per author request; the items below are refinements (🤔/♻️/🌱/⛏) rather than correctness blockers — each can be resolved or deferred to a follow-up at the author's discretion.

Non-blocking

♻️ refactor

  • apply_cache_headers invoked at five return sites — redirect branch, CL-too-large branch, CL-missing branch, UTF-8-fallback branch, and the fallthrough (sourcepoint.rs:476, 505, 513, 528, 567). Restructuring handle() around a single exit point (or wrapping the proxied Response in a post-handle adapter) would remove the per-branch bookkeeping. Low priority.

🌱 seedling

  • No direct test for handle() (sourcepoint.rs:414-569) — redirect rewriting, UTF-8 fallback, CL-missing skip, CL-too-large skip, and non-JS passthrough branches all lack unit coverage. Acknowledged in the round-3 response; tracking here for follow-up.
  • CSP interaction not documented (docs/guide/integrations/sourcepoint.md) — publishers with Content-Security-Policy: script-src https://cdn.privacy-mgmt.com will see rewritten first-party scripts blocked the moment they enable the integration. A note under "HTML Rewriting" calling out that CSP must allow the first-party host would save an incident report.

CI Status

  • cargo fmt: PASS
  • cargo clippy: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • browser integration tests: PASS
  • CodeQL: PASS

Comment thread crates/js/lib/src/integrations/sourcepoint/index.ts
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts
@ChristianPavilonis ChristianPavilonis removed the request for review from prk-Jr April 21, 2026 13:59
@ChristianPavilonis
Copy link
Copy Markdown
Collaborator Author

Implemented Sourcepoint follow-up updates and pushed commit.

What I changed:

  • Added __tsjs_sourcepoint runtime config bootstrap in Sourcepoint head inject and gated client-side guard installation with it.
  • Sourcepoint head injection now always emits config script; runtime trap is only included when rewrite_sdk is enabled.
  • Added strict cdn_origin validation to reject URL paths.
  • Preserved response headers on JS rewrite success path while normalizing encoding headers and content type.
  • Added regression coverage for the above behaviors.

I also resolved remaining open review threads and re-requested review from @prk-Jr.

Validation:

  • cargo test --workspace
  • cargo clippy --package trusted-server-core --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • cd crates/js/lib && npx vitest run
  • cd crates/js/lib && npm run -s format -- --check

Commit: e2fdb0c

Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

The Sourcepoint integration covers the expected rewrite layers, but I found two blocking issues around cookie semantics plus one follow-up consistency concern in the JS guard.

Blocking

🔧 wrench

  • Sourcepoint cookie state never makes the round trip: the proxy does not forward any Cookie header upstream, so Sourcepoint cookies previously set on the first-party endpoint are never sent back to Sourcepoint. That breaks the main baseEndpoint / CNAME flow and authenticated-consent-style cookie-backed state. (crates/trusted-server-core/src/integrations/sourcepoint.rs:256)

  • Cookie-setting Sourcepoint responses are still marked public-cacheable: both the passthrough path and the rewritten-JS path can return Cache-Control: public, max-age=... even when upstream sends Set-Cookie. If Sourcepoint uses server-set cookies on the first-party endpoint, that risks caching user-specific consent state. (crates/trusted-server-core/src/integrations/sourcepoint.rs:269, crates/trusted-server-core/src/integrations/sourcepoint.rs:377)

Non-blocking

🤔 thinking

  • JS and Rust host checks diverge on explicit ports: the client-side DOM guard compares URL.host, while the Rust rewriter compares hostname-only. That creates inconsistent behavior for URLs like https://cdn.privacy-mgmt.com:443/.... (crates/js/lib/src/integrations/sourcepoint/script_guard.ts:35, crates/trusted-server-core/src/integrations/sourcepoint.rs:228)

CI Status

  • fmt: PASS
  • clippy: FAIL
  • rust tests: FAIL
  • js tests: PASS
  • integration artifact prep: FAIL

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/js/lib/src/integrations/sourcepoint/script_guard.ts Outdated
@ChristianPavilonis
Copy link
Copy Markdown
Collaborator Author

Addressed the latest review round in commit 1a40868 and pushed to feature/sourcepoint-integration.

What changed:

  • Forwarded a narrowly scoped Sourcepoint cookie allowlist upstream instead of dropping the Cookie header
  • Added optional auth_cookie_name support for authenticated-consent deployments
  • Forced cookie-bearing Sourcepoint responses to Cache-Control: private, no-store on both passthrough and rewritten-JS paths
  • Aligned the client-side Sourcepoint guard with Rust by switching from URL.host to URL.hostname
  • Added Rust + Vitest regression coverage and updated Sourcepoint docs/sample config

Validation:

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo test --workspace
  • cd crates/js/lib && npx vitest run

I replied to and resolved the remaining review threads and re-requested review from @prk-Jr and @aram356.

Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Four-layer rewriting architecture (HTML attrs / JS body regex / window._sp_ trap / client DOM guard) is well-structured, and most prior 🔧 / ❓ findings have been addressed across rounds 1–4 (SSRF guard, UTF-8 safety, redirect rewriting, cookie allowlist, no-store on cookie responses, hostname-only host checks, scheme/host/path/query/fragment validation). However, a stale merge from main introduced a hard build break that fails three CI jobs.

Blocking

🔧 wrench

  • Build break: IntegrationProxy::handle signature drifted from the trait#581 added services: &RuntimeServices to the trait; the merge from main did not update Sourcepoint's impl. cargo fmt, cargo test, and prepare integration artifacts all fail with E0050. (sourcepoint.rs:540-549)
  • Use platform-agnostic client_ip from RuntimeServicescopy_headers calls Fastly-specific original_req.get_client_ip_addr(). With RuntimeServices now plumbed through, switch to services.client_info().client_ip to match Didomi's pattern and keep trusted-server-core portable. (sourcepoint.rs:269-272)

Non-blocking

🤔 thinking

  • Property-trap config field allowlist is hardcoded and silently goes stale — only baseEndpoint, mmsDomain, wrapperAPIOrigin, cmpOrigin, and metricUrl are patched. New Sourcepoint fields would silently leak. A snapshot test asserting the exact patched set would force review when the list changes. (sourcepoint.rs:735-742)
  • auth_cookie_name is unvalidated — empty string, whitespace, or ; / = characters are accepted. Empty string especially: a malformed cookie pair =value parses to name "", which auth_cookie_name = "" would forward. Add #[validate] (non-empty trimmed, length ≤ 64, [A-Za-z0-9_-]). (sourcepoint.rs:124-129)

♻️ refactor

  • is_likely_javascript_path matches non-JS API paths/wrapper/v2/* is JSON but is treated as JS-likely, forcing Accept-Encoding: identity for every CMP API hit. Tightening the heuristic (e.g. only .js and /unified/) preserves the optimisation only where applicable. (sourcepoint.rs:431-433)

📌 out of scope (carryover from earlier rounds, still open)

  • No direct unit test for handle() — redirect rewriting, UTF-8 fallback, Content-Length missing/too-large branches still lack coverage.
  • apply_cache_headers invoked at five return sites in handle() — restructuring around a single exit point would remove the per-branch bookkeeping.
  • CSP interaction undocumented — publishers with Content-Security-Policy: script-src https://cdn.privacy-mgmt.com will see rewritten first-party scripts blocked the moment they enable rewriting. A note in docs/guide/integrations/sourcepoint.md under "HTML Rewriting" would save an incident report.

CI Status

  • cargo fmt: FAIL (caused by trait-signature drift)
  • cargo test: FAIL (caused by trait-signature drift)
  • prepare integration artifacts: FAIL (caused by trait-signature drift)
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • CodeQL: PASS
  • Analyze (rust / actions / javascript-typescript): PASS

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Fifth-round review against main. CI is fully green and the prior 84 review comments across 4 rounds have all been addressed. New findings below are incremental hardening — one cache-poisoning concern that depends on upstream Cache-Control behavior and several non-blocking quality observations.

Blocking

🔧 wrench

  • Cache poisoning risk: forwarded Cookie not reflected in Vary for default cache headers (sourcepoint.rs:386)

Non-blocking

🤔 thinking

  • Inline <script> IIFE has no try/catch — can crash publisher hydration (sourcepoint.rs:784)
  • JS-rewrite success path leaves stale Vary: Accept-Encoding (sourcepoint.rs:499)
  • BackendConfig::from_url called per-request even though cdn_origin is now static (sourcepoint.rs:593)
  • apply_cache_headers is a no-op for redirects without Set-Cookie (sourcepoint.rs:637)

♻️ refactor

  • Duplicated Set-Cookie → private, no-store policy between apply_cache_headers and rewrite_javascript_response (sourcepoint.rs:492)
  • parse_sourcepoint_url's bare-host branch uses literal starts_with (sourcepoint.rs:516)

🌱 seedling

  • validate_auth_cookie_name does not reject reserved RFC 6265 attribute names (sourcepoint.rs:235)
  • is_likely_javascript_path does not include .mjs (sourcepoint.rs:467)

CI Status

  • fmt: PASS
  • clippy: PASS
  • rust tests: PASS
  • js tests: PASS
  • CodeQL: PASS
  • browser/integration tests: PASS

Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
Comment thread crates/trusted-server-core/src/integrations/sourcepoint.rs
@ChristianPavilonis ChristianPavilonis requested a review from aram356 May 5, 2026 14:22
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

All previously-flagged blockers are addressed. Multi-layer rewriting architecture is sound, validation is tight (host pinned, scheme/path/query/fragment all rejected), cookie hygiene is correct (allowlist + private, no-store on Set-Cookie), and 41 unit tests cover the slices. Approving with non-blocking observations the author should consider.

Blocking

❓ question

  • expect() on backend creation in constructor: panics on non-NameInUse Fastly errors; diverges from peer integrations and CLAUDE.md startup-error guidance (sourcepoint.rs:255-256)
  • OPTIONS / HEAD methods not registered: CORS preflight will 404 against /integrations/sourcepoint/cdn/*; needs empirical confirmation that Sourcepoint's API surface doesn't trigger preflight (sourcepoint.rs:636-638)

Non-blocking

🤔 thinking

  • JSON responses are passthrough: is_javascript_response gates rewriting on JS content-types only; Sourcepoint message-API responses may contain CDN URLs in JSON fields that leak through (sourcepoint.rs:710-713)
  • Content-Length-absent responses skip rewriting: chunked / unknown-length bodies silently bail; safe but coverage depends on upstream behavior (sourcepoint.rs:733-740)
  • Pre-computed backend diverges from peer integrations: permutive, lockr, datadome, aps all call BackendConfig::from_url inline at request time. Sourcepoint pre-computes in new with no Fastly Compute perf benefit. Either align or document (linked to Q1)
  • Head-injector trap silently swallows install errors: try{...}catch(e){} discards everything; consent traffic could leak to the third-party CDN with no observability (sourcepoint.rs:828-852)

♻️ refactor

  • copy_headers returns an unlabeled bool: a named enum makes the intent (forwarded_cookies) self-documenting (sourcepoint.rs:315-351)
  • Error messages duplicate the integration name: TrustedServerError::Integration already carries integration: SOURCEPOINT_INTEGRATION_ID; messages like "Failed to build Sourcepoint target URL" repeat what the type already says
  • is_javascript_response uses contains: a CT like text/plain; charset=javascript-utf8 would trigger rewriting; stricter parsing would be more defensive (very minor)

🌱 seedling

  • Cookie allowlist is hardcoded (sourcepoint.rs:57-69): future PR could expose this as a list in config so new Sourcepoint cookies don't require a code release
  • is_likely_javascript_path covers only .js, .mjs, /unified/: maintained list; document as such (sourcepoint.rs:497-499)
  • No end-to-end test for handle(): redirect rewriting, UTF-8 fallback, Content-Length skip, cookie filtering, and Cache-Control logic are individually unit-covered, but no test exercises the full path (hard to mock Request::send)
  • Property trap doesn't catch nested mutation (sourcepoint.rs:818-824): documented limitation; a Proxy on o.config would catch nested writes
  • 5 MB body cap is hardcoded: could be configurable per-integration (sourcepoint.rs:50)

⛏ nitpick

  • # Examples doctest is ignored: doesn't compile and doesn't really demonstrate usage; either replace with a real doctest or omit since every other integration's register follows the same shape (sourcepoint.rs:609-613)
  • Comment block mixes "what" and "why" (sourcepoint.rs:325-331): the Authorization-omission rationale sits inside the Accept-Encoding paragraph; splitting separates two distinct decisions

📌 out of scope

  • HEAD method support: some publishers do fetch(url, {method: 'HEAD'}) for asset checks; track separately if real traffic shows the need
  • JSON body rewriting: future PR could parse JSON and rewrite known URL-bearing fields, mirroring how the head-injector trap rewrites runtime config

CI Status

  • cargo fmt: PASS
  • cargo clippy: PASS
  • cargo test: PASS (41 Sourcepoint tests pass locally)
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • browser integration tests: PASS
  • integration tests: PASS
  • CodeQL: PASS

impl SourcepointIntegration {
fn new(config: Arc<SourcepointConfig>) -> Arc<Self> {
let backend_name = BackendConfig::from_url(&config.cdn_origin, true)
.expect("should configure Sourcepoint backend from validated origin");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

questionexpect() here will panic if Backend::builder().finish() returns anything other than NameInUse (host-with-port length, transient platform errors). validate_cdn_origin ensures URL parsing succeeds, but it can't guarantee the Fastly registration call. CLAUDE.md says invalid enabled config should surface as startup/config errors, not panics. Peer integrations (permutive, lockr, datadome, aps) defer backend creation to request handling and propagate errors via Result.

Since Fastly Compute reinitializes per request, there's no perf benefit to pre-computing in the constructor. Suggest changing new to return Result, propagating up through build()register().

}

fn routes(&self) -> Vec<IntegrationEndpoint> {
vec![self.get("/cdn/*"), self.post("/cdn/*")]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question — Only GET and POST are registered. CORS preflight (OPTIONS) requests will 404 and block any cross-origin XHR that requires preflight. Was OPTIONS verified unnecessary against Sourcepoint's actual API surface? The /wrapper/v2/messages POST with Content-Type: application/json and custom headers would normally trigger a preflight.

if response.get_status() == StatusCode::OK
&& self.config.rewrite_sdk
&& Self::is_javascript_response(&response)
{
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingis_javascript_response only triggers rewriting on javascript/ecmascript content types. Sourcepoint's /wrapper/v2/messages and similar endpoints respond with JSON that may contain cdn.privacy-mgmt.com URLs in fields like mmsDomain. Those URLs flow through to the client; only the runtime trap and DOM guard catch them.

The head-injector doc already mentions reporting missed fields, which suggests this is partially expected — but explicitly documenting "JSON bodies are not rewritten" would set the right expectation for future maintainers.

None => {
log::warn!(
"Sourcepoint: no Content-Length for {path}, \
skipping rewrite to avoid unbounded memory read (reason: missing_content_length)"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — Chunked / unknown-length responses silently bail out of body rewriting. Memory-safe, but coverage now depends on upstream behavior. A code comment that this is a known coverage gap (relying on the trap and DOM-guard backstops) would help the next maintainer. log::info! may also be more appropriate than warn! here since this is the safe path, not a degraded one.

"get:function(){{return v}},",
"set:function(n){{v=p(n)}}",
"}});",
"}}catch(e){{}}}})();",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingtry{...}catch(e){} discards every error. If the trap fails to install, consent traffic leaks to the third-party CDN with zero observability for the publisher.

A single console.warn('[ts:sourcepoint] _sp_ trap install failed', e) in the catch keeps the script best-effort while making install failures debuggable.

client_ip: Option<IpAddr>,
original_req: &Request,
proxy_req: &mut Request,
) -> bool {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactor — The bool return is "did we forward Sourcepoint cookies upstream?" which then gates cache policy at call sites. A named enum (enum CookieForwarding { None, Forwarded }) makes the intent self-documenting. Minor.

/// is a conservative preflight — false negatives just mean we skip the
/// `Accept-Encoding: identity` optimisation for that request.
fn is_likely_javascript_path(path: &str) -> bool {
path.ends_with(".js") || path.ends_with(".mjs") || path.starts_with("/unified/")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌱 seedling — If Sourcepoint introduces another chunk path convention beyond /unified/, the Accept-Encoding: identity optimization silently turns off for those paths. Correctness is preserved (the post-arrival is_javascript_response check still gates rewriting), it's just a quiet perf regression. A short comment noting this is a maintained list helps maintenance.

// initial assignment is not caught. The JS body regex rewriter
// covers that case for string literals in bundled code.
// - `s.replace()` replaces only the first occurrence per call, which
// is fine for the current set of scalar URL config fields.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌱 seedling — Documented limitation: window._sp_.config.baseEndpoint = "..." after the initial assignment isn't trapped. A Proxy on o.config would catch nested writes; the JS body regex covers most real-world cases so this can wait.

/// Maximum response body size (5 MB) that will be read into memory for
/// JavaScript rewriting. Responses larger than this are passed through
/// unmodified to avoid unbounded memory consumption.
const MAX_REWRITE_BODY_SIZE: u64 = 5 * 1024 * 1024;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌱 seedling — 5 MB body cap is a hardcoded constant. Could be configurable per-integration if a publisher needs larger CMP bundles. Not urgent.

///
/// # Examples
///
/// ```ignore
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick — The ignored doctest doesn't compile and doesn't really demonstrate usage. CLAUDE.md says public APIs need examples; an ignored snippet meets the letter but not the spirit. Either replace with a real doctest or omit (every other integration's register follows the same shape).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants