Skip to content

Add DataDome server-side validation and multi-backend routing#470

Closed
jevansnyc wants to merge 1 commit intomainfrom
arena-vcl-replacement
Closed

Add DataDome server-side validation and multi-backend routing#470
jevansnyc wants to merge 1 commit intomainfrom
arena-vcl-replacement

Conversation

@jevansnyc
Copy link
Copy Markdown
Collaborator

@jevansnyc jevansnyc commented Mar 11, 2026

No description provided.

Comment thread crates/common/build.rs Outdated
Comment thread crates/trusted-server-core/build.rs
Comment thread crates/trusted-server-core/src/integrations/datadome.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/datadome.rs

// DataDome server-side validation (if enabled)
// This validates requests at the edge before they reach the origin
match crate::integrations::datadome::validate_request_server_side(settings, &req) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ This runs on every proxied request — CSS, JS, images, fonts all trigger a DataDome HTTP call (200ms timeout). Most bot protection validates only document/navigation requests. Is this intentional? If so, worth a comment explaining why.

Comment thread crates/trusted-server-core/src/backend_router.rs Outdated
Comment thread crates/trusted-server-core/src/backend_router.rs Outdated
Comment thread crates/trusted-server-core/src/settings.rs Outdated
Comment thread crates/trusted-server-core/src/backend_router.rs Outdated
Comment thread crates/trusted-server-core/src/backend_router.rs Outdated
Comment thread crates/common/build.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/datadome.rs Outdated

// DataDome server-side validation (if enabled)
// This validates requests at the edge before they reach the origin
match crate::integrations::datadome::validate_request_server_side(settings, &req) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 P1 — Validation runs on every request including static assets

validate_request_server_side is called unconditionally — images, CSS, JS, fonts all trigger a blocking HTTP call to DataDome (200ms default timeout). This:

  • Adds latency to every static asset
  • Multiplies DataDome API usage/cost
  • Could hit DataDome rate limits

DataDome validation is typically needed only for document/HTML requests.

Fix: Filter by request path or accept header to skip known static assets, or at minimum only validate navigational requests.

Comment thread crates/trusted-server-core/src/backend_router.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/datadome.rs
Comment thread crates/trusted-server-core/src/settings.rs Outdated
Comment thread crates/common/src/publisher.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/datadome.rs Outdated
@aram356
Copy link
Copy Markdown
Collaborator

aram356 commented Mar 31, 2026

@ChristianPavilonis To fix and get ready for production

@ChristianPavilonis ChristianPavilonis removed their request for review April 7, 2026 20:46
Comment thread crates/trusted-server-core/src/backend_router.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/datadome.rs Outdated
Comment thread crates/trusted-server-core/src/integrations/datadome.rs
Comment thread crates/trusted-server-core/src/integrations/datadome.rs Outdated
Comment thread crates/trusted-server-core/src/backend_router.rs Outdated
Comment thread crates/trusted-server-adapter-fastly/backends.toml Outdated
Comment thread crates/trusted-server-core/src/publisher.rs
Comment thread crates/trusted-server-core/src/publisher.rs Outdated
Comment thread crates/trusted-server-core/build.rs
Comment thread crates/trusted-server-core/src/settings.rs Outdated
})
})
})
.transpose()?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Invalid path_regex is detected too late and then silently disables routing.

validate_path_pattern (settings.rs) only checks path_prefix/path_regex mutual exclusivity. Regex syntax is compiled here lazily, and BackendRouter::new is invoked per-request in handle_publisher_request with .ok() swallowing errors. A typo in a customer's backends.toml will:

  1. Pass build.rs parsing
  2. Pass runtime Settings::validate()
  3. Log an error on every request
  4. Silently fall through to default origin → routing broken, app keeps serving

This violates CLAUDE.md: "config-derived regex/pattern compilation must not use panic-prone expect()/unwrap(); invalid enabled config should surface as startup/config errors" and "Invalid enabled integrations/providers must not be silently logged-and-disabled during startup or registration."

Fix: validate regex syntax inside validate_path_pattern (try Regex::new(r).map_err(...)), and/or construct a BackendRouter inside Settings::validate() so bad config → startup failure. Then in handle_publisher_request the router build becomes infallible (or unwraps an already-validated router).


// DataDome server-side validation (if enabled)
// This validates requests at the edge before they reach the origin
match crate::integrations::datadome::validate_request_server_side(settings, &req) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — DataDome validation runs on every publisher request, including static assets.

handle_publisher_request is invoked for every proxied path — HTML, CSS, JS, images, fonts. With server_side_enabled=true every one of those enters validate_request_server_side. Unsampled requests still pay JSON-deserialize + validate + Arc-alloc; sampled requests pay a 200 ms round-trip to api-fastly.datadome.co per asset.

This is the same finding from the 2026-03-20 and 2026-04-14 reviews at this exact line — still not addressed.

Fix: filter to navigational/document requests before invoking (skip common static extensions, or use Sec-Fetch-Dest/Accept: image/*). Document the selected filter so operators know what is and isn't protected.

req: &Request,
) -> Result<bool, Report<TrustedServerError>> {
match build(settings) {
Some(integration) => integration.validate_request(req),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Per-request DataDome config deserialization + Arc allocation.

Every validate_request_server_side call goes through build(settings)serde_json::from_value::<DataDomeConfig>(raw.clone()) + config.validate() + Arc::new(...). Prior OnceLock caching was removed (rightly, for test isolation) but nothing replaced it. Combined with the finding in publisher.rs (validation running on every asset), each asset request pays the full deserialize/validate cost.

Fix: build the integration once during get_settings() (or at handler init if you introduce a HandlerContext) and pass &DataDomeIntegration into validate_request_server_side instead of &Settings.

log::warn!("[datadome] server_side_enabled = true has no effect when enabled = false; server-side validation is inactive");
}
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — The misconfiguration warning re-deserializes on every disabled-but-configured request.

When enabled=false and a raw [integrations.datadome] section exists, this Ok(None) branch still runs serde_json::from_value::<DataDomeConfig>(raw.clone()) per request to decide whether to log a warning. Move this to startup (one-time pass in get_settings() or Settings::validate()). A warning logged once at boot is what operators want; logging it per-request floods the log and keeps paying deserialize cost.

validation_req.set_header(
"x-datadome-params:clientid",
urlencoding::encode(datadome_cookie).as_ref(),
);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

questionx-datadome-params:clientid is still URL-encoded while ip is raw.

Commit 01e02e2c changed ip to raw per DataDome's contract, but left clientid wrapped in urlencoding::encode. Since extract_datadome_cookie already returns just the token value (no ; or ,), the encoding is likely unnecessary and could mismatch what DataDome expects.

Please link the DataDome spec line that requires encoding for clientid but not ip, or drop the urlencoding::encode.

#
# This file is merged into the embedded binary config at build time by
# crates/trusted-server-core/build.rs. It is separate from trusted-server.toml to keep
# customer-specific configuration out of the shared application template.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📌 out of scopebackends.toml commits ~170 Arena Group publisher domains to the OSS repo.

Third time this has been flagged. Customer business data (newsweek.com, nytimes.com, washingtonpost.com, etc. mapped to raven-public.prod.saymedia.com) is visible to anyone reading the public repo. Either move to a Fastly config/secret store loaded at runtime, or add a documented private overlay file (e.g., backends.private.toml git-ignored).

If the team has decided this is fine, a comment at the top of the file saying so would close the loop.

}
}
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 note — Test gap: validate_request_inner HTTP path has no coverage.

All existing tests exit early (disabled, missing-key, sample=0, sample=100+missing-key). None exercises the actual HTTP request/response flow: header construction, 200→allow, 403→block, error→fail-open.

Viceroy can host a mock backend for this. Without it, a regression like lowercasing x-datadome-params:Key or dropping the method header would ship silently.

.next()
.unwrap_or("api-js.datadome.co")
// split() on &str always yields at least one element, so this is unreachable
.unwrap_or(url)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick.unwrap_or(url) with "unreachable" comment is misleading.

Prefer .expect("split always yields at least one element") — clippy/future-you wouldn't wonder why the fallback was chosen.

settings.proxy.certificate_check,
)
.map_err(|e| {
log::error!("Failed to build backend router: {:?}", e);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick{:?} on Report<_> prints the full debug backtrace; {err} (Display) is usually what you want in edge logs. Same on line 343. Fine if you're sinking to Fastly real-time logging with structured output — confirm before changing.

Comment thread .gitignore
/crates/integration-tests/browser/test-results/
/crates/integration-tests/browser/playwright-report/
/crates/integration-tests/browser/.browser-test-state.json

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick — Global *.vcl ignore with no .vcl files in tree. Scope to a specific directory or drop until there's content.

@aram356
Copy link
Copy Markdown
Collaborator

aram356 commented Apr 23, 2026

@ChristianPavilonis Please split into 2 prs

@jevansnyc jevansnyc force-pushed the arena-vcl-replacement branch from 38bb4d4 to 84f091b Compare May 6, 2026 16:05
@ChristianPavilonis ChristianPavilonis deleted the arena-vcl-replacement branch May 6, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Convert VCL into Rust code in TS to not require extra CDN VCL hop and increase reponse times

3 participants