Implement Edge Cookie identity system with KV graph and integration tests#621
Implement Edge Cookie identity system with KV graph and integration tests#621ChristianPavilonis wants to merge 64 commits intomainfrom
Conversation
de0525d to
0940c34
Compare
aram356
left a comment
There was a problem hiding this comment.
Summary
Comprehensive EC identity subsystem: HMAC-based ID generation, KV identity graph with CAS concurrency, partner sync (pixel, batch, pull), identify endpoint, and auction bidstream decoration. The architecture is well-designed — clean separation of concerns, strong input validation, and solid concurrency model. A few cleartext logging issues and a docs inconsistency need addressing before merge.
Blocking
🔧 wrench
- Cleartext EC ID logging: Full EC IDs are logged at
warn!/error!/trace!levels in 5 locations acrossec/mod.rs(lines 128, 211, 296),sync_pixel.rs(line 91), andpull_sync.rs(line 91). Thelog_id()truncation helper was introduced in this PR for exactly this purpose but is not used consistently. See inline comments for fixes. - Stale env var in docs:
TRUSTED_SERVER__EDGE_COOKIE__SECRET_KEYstill appears inconfiguration.md(lines 40, 956) anderror-reference.md(line 154). The PR renamed the TOML section from[edge_cookie]to[ec]and updated some env var references but missed these. Users following the docs will set a variable that is silently ignored, resulting in a startup failure from the placeholder passphrase. Should beTRUSTED_SERVER__EC__PASSPHRASE.
❓ question
HttpOnlyomitted from EC cookie (ec/cookies.rs:11): Intentionally omitted for a hypothetical future JS use case. Is there a concrete plan? The identify endpoint already exposes the EC ID. If not,HttpOnlywould be cheap XSS defense-in-depth.
Non-blocking
🤔 thinking
- Uppercase EC ID rejection in batch sync (
batch_sync.rs:209):is_valid_ec_idrejects uppercase hex, but batch sync validates before normalizing (line 225). Partners submitting uppercase EC IDs getinvalid_ec_idinstead of normalization.
♻️ refactor
_pull_enabledindex lacks CAS (partner.rs:564): Read-modify-write without generation markers. Concurrent partner registrations can overwrite each other's index updates. Self-healing via fallback, but inconsistent with the CAS discipline used elsewhere.
🌱 seedling
- Integration tests don't verify identify response shape:
FullLifecycleandConcurrentPartnerSyncsonly assertuids.<partner>. Theec,consent,degraded,eids, andcluster_sizefields from the API reference are never checked. - Pixel sync failure paths untested end-to-end:
ts_reason=no_ec,no_consent, domain mismatch redirects are unit-tested but not covered in integration tests.
📝 note
trusted-server.tomlships banned placeholder:passphrase = "trusted-server"is rejected byreject_placeholder_secrets. Works in CI via env override, but new contributors will hit a confusing startup error. A TOML comment would help.
⛏ nitpick
Eid/UidmissingDeserialize:openrtb.rstypes deriveSerializeonly. If the auction ever needs to parse EIDs from provider responses,Deserializewill be needed.
CI Status
- cargo fmt: PASS
- clippy: PASS
- cargo test: PASS
- vitest: PASS
- integration tests: FAIL
- CodeQL: FAIL (likely related to cleartext logging findings above)
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
Comprehensive EC identity subsystem: HMAC-based ID generation, KV identity graph with CAS concurrency, partner sync (pixel, batch, pull), identify endpoint, and auction bidstream decoration. The architecture is well-designed with clean separation of concerns, strong concurrency discipline, and solid security choices throughout. Four new blocking findings join the existing ones — three are input-validation gaps and one is a PII leak in the pull sync URL builder that should be resolved before merge.
Blocking
🔧 wrench
- PII leak: client IP forwarded to partners —
pull_sync.rs:273:build_pull_request_urlappends the raw user IP as theipquery parameter on every outbound pull-sync request, exposing it in partner access logs. Contradicts the privacy-preserving design intent and is likely a GDPR Article 5 concern. Remove theipparameter or gate it behind an explicit per-partnerallow_ip_forwardingconfig flag. - Pull sync response body unbounded —
pull_sync.rs:325:take_body_bytes()with no size cap before JSON parsing. A malicious partner can exhaust WASM memory. Batch sync and admin endpoints both haveMAX_BODY_SIZEguards; this path needs one too (64 KiB is generous for a{"uid":"..."}response). - Whitespace-only UIDs bypass validation —
batch_sync.rs:217andsync_pixel.rs:41:is_empty()passes" "and"\t", which get stored as garbage EID values in the KV graph. Replace withtrim().is_empty()at both sites. rand::thread_rng()WASM compatibility unverified —generation.rs:52: requiresgetrandomwith thewasifeature onwasm32-wasip1. Nativecargo testpassing does not prove the WASM build is safe; integration tests are already failing so this may not have been caught. Switch toOsRngor addgetrandom = { version = "0.2", features = ["wasi"] }explicitly.
Non-blocking
🤔 thinking
process_mappingsUpsertResult branches untested —batch_sync.rs:232:NotFound,ConsentWithdrawn, andStalehave zero unit test coverage.Staleis documented as "counted as accepted" with no regression test.- Shared error messages make pull sync validation tests non-isolating —
partner.rs:152,165: both missing-URL and missing-token conditions return the identical error string, so the tests asserting on substrings pass even if the wrong branch fires. Use distinct messages per condition.
♻️ refactor
ec_consent_grantedhas no tests —consent.rs:20: the entry-point gate for all EC creation has no#[cfg(test)]section. Add smoke tests for granted and denied paths.- KV tombstone path never exercised in finalize tests —
finalize.rs:149: all finalize tests passkv: None, so the tombstone write and the cookie-EC ≠ active-EC case inwithdrawal_ec_idsare untested. - Missing HMAC prefix stability test —
generation.rs:228: no test asserts that the same IP + passphrase always produces the same 64-char hash prefix, which is the core deduplication guarantee for the KV graph. normalize_ec_idduplicated in integration tests —integration-tests/tests/common/ec.rs:374: reimplementsnormalize_ec_id_for_kvfrom core. Re-export and use the canonical implementation.
⛏ nitpick
- Use
saturating_subfor consistency —kv.rs:605: the subtraction is safe due to the guard above but inconsistent withsaturating_subused throughout the rest of the module. log_idshould encapsulate the…suffix —mod.rs:51: every call site manually appends"…"in the format string; move it inside the function.
Praises 👍
- CAS-based optimistic concurrency (
kv.rs): bounded retries, gracefulItemPreconditionFailedhandling, andMAX_CAS_RETRIESpreventing infinite loops — textbook correct for a single-writer KV model. - Constant-time API key comparison (
partner.rs):subtle::ConstantTimeEqfor timing attack prevention on key lookups. Many implementations miss this entirely. KvMetadatafast-path consent check (kv_types.rs): mirroringok/country/cluster_sizein metadata to avoid streaming the full KV body is an excellent performance optimisation with the right constraint test.evaluate_known_browserwithOnceLock-cached hash table (device.rs): pre-hashing fingerprints once and caching viaOnceLockis the right WASM-compatible lazy-init pattern.- HMAC stability explicitly documented (
generation.rs:30-33): noting that the output format is a "stable contract" that would invalidate all existing identities if changed is exactly the kind of correctness annotation that prevents future breakage.
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test: PASS
- vitest: PASS
- integration tests: FAIL
- CodeQL: FAIL (cleartext logging — covered in prior review)
aram356
left a comment
There was a problem hiding this comment.
Summary
This PR lands a large, well-partitioned Edge Cookie identity subsystem, and the prior blocking feedback (PII logging via log_id(), whitespace UID checks, pull-sync response size cap, dropping raw client IP from pull-sync query strings, constant-time auth comparisons) has been materially addressed. Blocking issues remaining are: a CI-red integration test (deterministic, root cause in the test client, not the server), an unverified rand::thread_rng() on wasm32-wasip1 that CI cannot catch because no wasm build runs, two new CodeQL alerts on ec/kv.rs riding the pre-existing settings-taint false-positive flow, one untested backend-state machine in batch_sync, and two questions on the auction / partner surfaces.
Blocking
🔧 wrench
- CI red —
test_ec_lifecycle_fastlydeterministic failure: the test client omitsSec-Fetch-Dest: documentandAccept: text/html, sois_navigation_request()correctly returnsfalseand EC never generates (crates/integration-tests/tests/common/ec.rs:93). rand::thread_rng()onwasm32-wasip1unverified; CI builds no wasm target (crates/trusted-server-core/src/ec/generation.rs:52). Add a wasm build to CI and/or switch to an explicitgetrandomwith thewasifeature.- Two new CodeQL alerts on
ec/kv.rs:637and:789ride the samesettings.reject_placeholder_secrets()taint flow that already produces noise onmain. Both are false positives but block the CodeQL gate. Fix with a scoped suppression or a repo-level CodeQL config filter onrust/cleartext-loggingfor settings-derived values. UpsertResult::{NotFound, ConsentWithdrawn, Stale}untested inbatch_sync::process_mappings(crates/trusted-server-core/src/ec/batch_sync.rs:231-244). Prior reviewer flagged this; still unresolved.
❓ question
- Auction
user.id = ""on the no-consent path is emitted into the OpenRTB bid request JSON (crates/trusted-server-core/src/auction/endpoints.rs:60-77,formats.rs:186). Has this been validated against the live Prebid Server target? If not, changeUserInfo.idtoOption<String>withskip_serializing_if. update_pull_enabled_indexrace self-heal path untested (crates/trusted-server-core/src/ec/partner.rs:552-604). The documented fallback atpartner.rs:350is the only thing keeping the index correct under concurrent upserts — please add a concurrency test or file a tracked follow-up.
Non-blocking
🤔 thinking
MAX_CAS_RETRIES = 3, no backoff — likely to starve under contention on hot prefixes; consider exponential backoff + jitter (crates/trusted-server-core/src/ec/kv.rs:300-339).HashMap<String, KvPartnerId>non-deterministic serialization — breaks future hash-based dedup and byte-diffing of stored values; useBTreeMaporIndexMap(crates/trusted-server-core/src/ec/kv_types.rs:59).seen_domainsdrop-newest eviction freezes long-lived ECs on their first 50 domains, defeating thelasttimestamp; switch to LRU or document (crates/trusted-server-core/src/ec/kv.rs:627-642).
♻️ refactor
- EID gating failures log at
debug!— bump towarn!so ad-stack anomalies surface in production (crates/trusted-server-core/src/auction/endpoints.rs:80). ec_consent_granted()is a 1-line pass-through — inline or document the sealing-point intent (crates/trusted-server-core/src/ec/consent.rs:20-22).- Unvalidated domain strings in EC graph writes — add a 255-byte length cap and hostname-shape check at the write boundary (
crates/trusted-server-core/src/ec/kv.rs update_last_seen,kv_types.rs).
🌱 seedling
- No integration test for the GPC / consent-denied identify path. Once the
Sec-Fetch-Destfix lands, addec_full_lifecycle_with_gpcsendingSec-GPC: 1and asserting/_ts/api/v1/identifyreturns 403 / no EID payload (crates/integration-tests/tests/frameworks/scenarios.rs:501-565). - No wasm-target build in CI — root cause of how the
rand::thread_rng()concern reached this point unnoticed. Addingcargo build -p trusted-server-adapter-fastly --target wasm32-wasip1would catch an entire class of deps-pin regressions.
📌 out of scope
- Pull-sync EC ID still travels as a URL query parameter — acknowledged in the PR description as deferred. Please file the follow-up issue and reference it from the commit that ships this PR so it is not lost (
crates/trusted-server-core/src/ec/pull_sync.rs:260-263).
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test (native): PASS
- vitest: PASS
- integration tests: FAIL —
test_ec_lifecycle_fastly(see blocking #1) - CodeQL: FAIL — 15 high alerts; 13 are pre-existing
settings.reject_placeholder_secrets()taint-flow false positives already present onmain, 2 are new onec/kv.rsriding the same flow (see blocking #3)
7009838 to
43df212
Compare
aram356
left a comment
There was a problem hiding this comment.
Summary
Large PR (98 files, +11,974 / −3,099). Architecture and consent plumbing are solid; this review focuses on concrete defects and hardening gaps verified against 109c929c. Three CI checks currently fail (integration tests, format-typescript, CodeQL); two of those are trivially fixable and inline-commented below. The CodeQL failure appears to be a false positive that needs a team decision.
Blocking
🔧 wrench
- Integration tests fail to compile:
crates/integration-tests/Cargo.tomlmissingtrusted-server-coredev-dependency whiletests/common/ec.rs:19andtests/frameworks/scenarios.rs:5import it. (inline) - format-typescript CI failure:
crates/js/lib/src/integrations/prebid/index.ts:413fails prettier — one-line fix vianpm run format. (inline) - Unbounded cookie payloads:
ec/prebid_eids.rs:43(ts-eids) andec/prebid_eids.rs:111(sharedId) decode/clone attacker-controlled cookie values with no size ceiling. (inline)
❓ question
- Consent-withdrawal split-brain at
ec/finalize.rs:56: consent-store delete failures are logged-and-continued while KV tombstones still write. Intentional (KV = source of truth) or should withdrawal fail-closed? (inline) - CodeQL cross-cutting: 15 high-severity
rust/cleartext-loggingalerts all namesettings.reject_placeholder_secrets()as the tainted source, pinned to log-call lines that do not actually reference that method (e.g.auction/orchestrator.rs:125is a timeout warning). This is taint-analyzer propagation through a boolean validator that uses.expose()internally; the logged bytes are field names, not secret material. How would the team like to resolve: (a)#[allow]on the validator with a comment explaining the false positive, (b) renamereject_placeholder_secrets→validate_secrets_not_placeholdersto break the taint-name heuristic, or (c) configure a CodeQL suppression?
Non-blocking
🤔 thinking
- Bot gate is presence-only (
ec/device.rs:76) — attackers with spoofed JA4 pass. Document threat model. (inline) user.id = Nonewhen EC not allowed (auction/endpoints.rs:60) — correct for consent, but may depress bidder dedup/yield. Consider a session-scoped ephemeral ID. (inline)- Pull-sync only on organic routes (
trusted-server-adapter-fastly/src/main.rs:373) — auction-heavy SPA publishers never refresh KV. Document or extend. (inline)
♻️ refactor
- Duplicated bearer parser in
ec/identify.rs:34andec/batch_sync.rs:101— extract toec/auth.rs. (inline) - Rate-limiter hourly→minute conversion is lossy by up to ~2× (
ec/rate_limiter.rs:59) — document and test the overshoot bound. (inline)
🌱 seedling
- Non-deterministic EID order on timestamp ties (
ec/eids.rs:64) — addpartner_idas secondary sort key. (inline) - No post-deserialize bounds on
KvEntry(ec/kv.rs:158) — legacy/corrupt oversized entries will round-trip through. (inline) platform/test_support.rsdefines stubs (NoopConfigStore,NoopSecretStore,noop_services) that no unit test exercises, so trait drift will silently break them. Either consume in ≥1 unit test or remove.- Auction ext metadata unsrialized:
auction/formats.rscomputes strategy/provider-count/timing intoresult.metadatabut never emits it intoext.trustedServeron the OpenRTB response — a useful debug surface.
⛏ nitpick
has_cookie_mismatchnaming (ec/mod.rs:398) — ambiguous; preferheader_overrides_cookie. (inline)
📌 out of scope
ec_id/client_ipas query parameters in pull sync — PR description acknowledges as follow-up.
CI Status
- cargo fmt: PASS
- clippy: PASS
- cargo test (workspace): PASS
- vitest: PASS
- format-typescript: FAIL (see inline)
- integration tests: FAIL — compile error (see inline)
- CodeQL: FAIL — 15 high (likely false positives) (see question above)
- browser integration tests: PASS
- format-docs: PASS
|
Addressed the remaining merge blockers in 4d2ef30:
Validation run locally:
|
|
Follow-up update in 6016c48 to address the remaining open review threads:
Validation re-run locally:
I also replied to and resolved the remaining open review threads tied to these changes. |
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
This PR lands the Edge Cookie identity system end to end, but I found two merge blockers in the request lifecycle.
Blocking
🔧 wrench
- Identify GET still missing browser-direct CORS support:
OPTIONS /_ts/api/v1/identifyreflects CORS headers, butGET /_ts/api/v1/identifynever applies them, so the documented browser-direct flow still fails after a successful preflight.classify_origin()also accepts matching hosts regardless of scheme, while the spec only allowshttpsorigins.
Fix: classify origin in the GET path too, return 403 for denied origins, reflect CORS headers on allowed GET responses, and require origin_url.scheme() == "https" in classify_origin(). (crates/trusted-server-core/src/ec/identify.rs:31, crates/trusted-server-core/src/ec/identify.rs:171)
CI Status
- cargo fmt: PASS
- cargo test: PASS
- vitest: PASS
- browser integration tests: PASS
- integration tests: FAIL (
test_ec_lifecycle_fastly: organicGET /should setts-eccookie) - CodeQL: FAIL (12 open alerts on the merge ref)
aram356
left a comment
There was a problem hiding this comment.
Summary
Edge Cookie identity subsystem is materially complete and the prior review's blockers have been addressed: navigation-headers fix unblocks test_ec_lifecycle_fastly, UpsertResult variants are tested, user.id is Option<String> with skip_serializing_if, the partner-registry race is gone (architecture is now config-derived), MAX_CAS_RETRIES is 5 with backoff+jitter, raw client IP is no longer forwarded to pull-sync partners, and the WASM build now runs in CI.
Remaining blocking issues: a plaintext partner pull-token sitting on a Debug-deriving struct, an unintentional-looking read/write asymmetry in the bot gate, and a still-failing CodeQL gate riding the same Settings Debug taint-flow false positive that was flagged on the prior review.
Blocking
🔧 wrench
- Plaintext
ts_pull_tokenonPartnerConfig(crates/trusted-server-core/src/ec/registry.rs:44, :208)
❓ question
- Bot-gate read/write asymmetry intentional? (
crates/trusted-server-adapter-fastly/src/main.rs:208-212vs.:272-276) - CodeQL CI gate is failing on 12 new
rust/cleartext-loggingalerts, all riding thesettings.reject_placeholder_secrets()→log::debug!("Settings {settings:?}")taint flow that was flagged on the prior review.SettingsDebug is redacted viaRedacted<T>, so the alerts are benign — but they block the gate. Sites:crates/trusted-server-adapter-fastly/src/main.rs:71,crates/trusted-server-core/src/publisher.rs:178/:341/:382,crates/trusted-server-core/src/html_processor.rs:70,crates/trusted-server-core/src/consent/mod.rs:173,crates/trusted-server-core/src/auction/orchestrator.rs:125/:278/:338/:408,crates/trusted-server-core/src/integrations/nextjs/html_post_process.rs:123/:456. Plan to suppress at PR level or add a repo-wide CodeQL filter onRedacted-derived flows before merge?
Non-blocking
🤔 thinking
std::thread::sleepin CAS retry loop on wasm32-wasip1 (crates/trusted-server-core/src/ec/kv.rs:343, :426, :471, :558)seen_domainsstill usesHashMapwhileidswas converted toBTreeMap(crates/trusted-server-core/src/ec/kv_types.rs:133)MIN_PASSPHRASE_LENGTH = 8is weak entropy for the HMAC-SHA256 anchor (crates/trusted-server-core/src/settings.rs:353)seen_domainsdrop-newest eviction freezes long-lived ECs at the 50-domain cap (crates/trusted-server-core/src/ec/kv.rs:667-683) — repeat from prior review
♻️ refactor
PartnerRegistry::all()exposesHashMapiteration order to callers (crates/trusted-server-core/src/ec/registry.rs:174-176)RateLimiter::exceeded_per_minuteround-trips through hourly math (crates/trusted-server-core/src/ec/rate_limiter.rs:42-46, :49-52)
🌱 seedling
- No end-to-end integration tests for batch sync 429 / 400 / 413 paths (
crates/integration-tests/tests/frameworks/scenarios.rs:761-779) - Bot-gate observability hook (
crates/trusted-server-core/src/ec/device.rs:84-86) — operators have no metric for what fraction of traffic is excluded
📌 out of scope
- Repo-level CodeQL filter for the
Redacted-derivedrust/cleartext-loggingtaint flow so this stops gating PRs. Second consecutive PR review where the same false-positive flow blocks CI.
⛏ nitpick
derive(Debug)onPartnerRegistry/PartnerConfig(crates/trusted-server-core/src/ec/registry.rs:17, :53) — combined with the plaintext token wrench, this is the loaded gun.
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test (workspace): PASS
- vitest: PASS
- integration tests: PASS
- browser integration tests: PASS
- WASM release build: PASS
- CodeQL: FAIL — 12 new alerts, all on the same
SettingsDebug taint-flow false-positive flow (see ❓ above)
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Clearing an old pending local review so I can reply to the current threads from the branch update.
Read the sharedId cookie (set by Prebid's SharedID User ID Module) on organic requests and store the value as a partner UID in the EC identity graph. Uses the same debounce and best-effort semantics as the ts-eids cookie ingestion path. Add sharedid partner config (source_domain: sharedid.org, atype: 1) to trusted-server.toml. The backend matches the cookie value to this partner via PartnerRegistry.find_by_source_domain() during ec_finalize_response.
… module Origin/main introduced RuntimeServices platform abstraction (PR2-PR6) after our branch diverged. The rebase left inconsistencies between the trait definitions from our branch and the implementations from origin/main. Restore source files to their ORIG_HEAD state to reconcile: - Remove services: &RuntimeServices from IntegrationProxy::handle implementations (datadome, didomi, gpt, lockr, permutive, testlight, gtm) - Remove services from run_auction, proxy_request, AuctionContext - Restore publisher, proxy, orchestrator, types to branch-correct state - Restore streaming/html/rsc_flight modules to branch state
… sensitive information' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
… sensitive information' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
… sensitive information' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
9261993 to
d8c943d
Compare
aram356
left a comment
There was a problem hiding this comment.
Summary
Deep review of the EC identity subsystem (crates/trusted-server-core/src/ec/**), the auction-bidstream wiring, and the Fastly adapter route table. The design is sound and the test coverage is strong — the blockers below are concrete hardening gaps in pre-validation memory bounds, HMAC key strength, and post-send compute safety, not architectural concerns.
Blocking
🔧 wrench
- DoS via unbounded request body in batch sync —
req.take_body_bytes()reads the full body beforeMAX_BODY_SIZEis checked, so an authenticated partner can pressure the WASM heap before the limit fires. CheckContent-Lengthfirst. (crates/trusted-server-core/src/ec/batch_sync.rs:130-136) MIN_PASSPHRASE_LENGTH = 8is too short for the long-lived HMAC-SHA256 key that derives every visitor's EC ID. NIST SP 800-107 recommends ≥ the security strength (32 bytes). Raise to 32 (or at least 24). (crates/trusted-server-core/src/settings.rs:412)
❓ question
- Pull-sync
pending.wait()has no per-partner timeout — withmax_concurrency = 3, a single hung partner pins the worker compute window post-send. Is the intent to rely on Fastly's compute cap, or should each wait be bounded? (crates/trusted-server-core/src/ec/pull_sync.rs:264)
Non-blocking
🤔 thinking
- Broad
x-ts-*header strip inec_finalize_response— works today only because the outerfinalize_responsere-addsx-ts-version/x-ts-envafterwards. Fragile if middleware order changes. (crates/trusted-server-core/src/ec/finalize.rs:154-168) - CORS reflects the original-cased Origin even though validation lowercases — browsers do byte-exact compare on
Access-Control-Allow-Origin. (crates/trusted-server-core/src/ec/identify.rs:204-212) upsert_partner_idrecovery path writes a synthetic entry withcountry = "ZZ", no consent context, andconsent.ok = truewhen the root row is missing. Could fail-closed instead. (crates/trusted-server-core/src/ec/kv.rs:393-415)- Public surface in
device.rsis broader than needed — most internals can bepub(super). (crates/trusted-server-core/src/ec/device.rs:102-176)
♻️ refactor
EC_RESPONSE_HEADERSconst is redundant with the dynamicx-ts-*strip — pick one. (crates/trusted-server-core/src/ec/finalize.rs:23-28)- Group batch-sync mappings by EC ID to halve CAS round-trips when the same EC repeats. (
crates/trusted-server-core/src/ec/batch_sync.rs:166-224)
🌱 seedling
- Read-side
entry.validate()runs on every KV load — write-side validation already exists viaserialize_entry, so reads can trust the round-trip. (crates/trusted-server-core/src/ec/kv.rs:150-169) - Schema version is enforced but no migration path is documented — define the strategy now so a future v2 bump isn't an emergency. (
crates/trusted-server-core/src/ec/kv_types.rs:17, 360-365)
⛏ nitpick
is_valid_ec_idcase asymmetry between hash prefix (lowercase only) and suffix (mixed-case alphanumeric) is intentional but undocumented. (crates/trusted-server-core/src/ec/generation.rs:166-186)let _timestamp = mapping.timestamp;is dead code — either remove or replace with an intent comment. (crates/trusted-server-core/src/ec/batch_sync.rs:191)
👍 praise
- Strong unit-test coverage across every EC module — CAS conflict simulation, withdrawal dedupe, JA4/H2 known-browser fingerprinting, schema-version validation, base64 EID truncation binary search, and CORS classification all have explicit tests.
- Centralised
ec_finalize_responseis invoked from every route (auth-fail, streaming, organic) so withdrawal and cookie semantics stay consistent. - Clean separation between
create_or_revive(writer-side, allows revive) andupsert_partner_id_if_exists(reader-rejecting viaUpsertResult::ConsentWithdrawn) — maps directly to the per-mapping rejection reasons in batch sync.
CI Status
- The status-check rollup for
4166637shows onlyCode Quality / Analyze (javascript-typescript): SUCCESS. The repository's Rust CI workflow (fmt/clippy/cargo test --workspace) and JS test workflow do not appear to have run on this head SHA. Please confirm with the project owners and run them locally before merge — the diff (+15,470 / −4,431 across 106 files) is too large to land on a partial check.
aram356
left a comment
There was a problem hiding this comment.
Summary
Deep re-review of the EC identity subsystem against main, picking up where the 2026-04-29 review left off. Hardening commit 8e08e642 materially addressed the prior blockers (passphrase length raised to 32, batch_sync content-length pre-check, upsert_partner_id fail-closed on missing root, registry-based partner header stripping). Six concrete blockers remain, plus one open question.
Blocking
🔧 wrench
- Pull-sync read-before-size-check —
ec/pull_sync.rs:319-327repeats the same fail-pattern thatbatch_syncwas just hardened against; an authenticated partner can pressure WASM heap before the 64 KiB cap fires. (inline) - Bot-gate kills withdrawal tombstones —
main.rs:221-225passeskv_graph = Nonefor bot-classified clients toec_finalize_response, so withdrawals from privacy-extension-heavy traffic clear the cookie but never tombstone the KV row. Within the 24h window an authenticated batch-sync can repopulate. (inline) prebid_eidsfirst-non-empty UID hides oversized + valid pairs —ec/prebid_eids.rs:113-124picks the first non-empty UID then size-checks it; an oversized first UID drops the entire source even when valid trailing UIDs exist. (inline)EcPartner.api_tokenaccepts empty / short values —ec/registry.rs:90-100hashes whatever serde delivers, including""(SHA-256 of empty) and 18-byte fixtures like"inttest-api-key-1". The Bearer credential gating identify and batch-sync should match the EC passphrase floor (≥ 32 bytes). (inline)/identifylacksCache-Control: no-store—ec/identify.rs:122-126and the 401/403/204 paths set neitherCache-Controlnor a completeVary. Defense-in-depth for an identity endpoint that returns per-partner PII. (inline)- Unbounded EID parsing in auction decoration —
auction/endpoints.rs:263is O(n*m) on attacker-controlledclient_eids; no caps on total entries or per-source UIDs. A pathological cookie/body inflates the OpenRTB request body to PBS. (inline)
❓ question
- Does PBS echo back
ext.trusted_server.{request_host, request_scheme}? —integrations/prebid.rs:1209-1228reads these from the response, but stock PBS does not preserve arbitrary requestextfields. If your deployment doesn't echo them, thelog::warn!fires every auction and the first-party creative-rewrite guarantee silently no-ops. (inline)
Non-blocking
♻️ refactor / 🤔 thinking / ⛏ nitpick
See inline comments on:
ec/prebid_eids.rs:59(useReport<E>instead ofResult<_, String>)ec/partner.rs:28(OnceLock<Result<Regex, String>>for an infallible regex)ec/cookies.rs:41(pub→pub(crate)forcreate_ec_cookie)ec/rate_limiter.rs:52, 104(TOCTOU race;hourly_limit=0fail-open)ec/pull_sync.rs:261(drain_pull_batchwaits sequentially)ec/kv.rs:668-672(networkfield overwrite drops future fields)ec/prebid_eids.rs:100(debug log echoes cookie content via serde error)auction/types.rs:82-90(#[serde(skip)]on Deserialize-deriving struct)ec/finalize.rs:41-49(parameter count at the 7-arg cap)trusted-server.toml:6-9(admin handlerpassword="changeme"covers new EC admin path)
📌 out of scope
- Integration test gaps —
EcScenario::all()covers seeded-EC happy paths but not: re-consent after withdrawal, tombstone durability (no second request asserts blocked state), pull-sync, bot-gate write suppression, batch-sync rate-limit / oversize negative cases. The PR description implies fuller lifecycle coverage that doesn't exist. pull_sync_ttl_secandcluster_recheck_secs— retained as compatibility fields but unused. Either consume or warn at startup.- CI coverage — only CodeQL/Analyze ran on
8e08e642(PASS).cargo fmt/clippy/testand the JS suite haven't run on the current head, and the branch isDIRTYvs main. Resolve conflicts and let full CI run before merge.
CI Status
- fmt: NOT RUN
- clippy: NOT RUN
- rust tests: NOT RUN
- js tests: NOT RUN
- CodeQL (Analyze javascript-typescript): PASS
- Mergeability: CONFLICTING (DIRTY vs main)
| let body = response.take_body_bytes(); | ||
| if body.len() > MAX_PULL_RESPONSE_BYTES { | ||
| log::warn!( | ||
| "Pull sync: partner '{}' returned oversized response ({} bytes), rejecting", | ||
| partner_id, | ||
| body.len() | ||
| ); | ||
| return None; | ||
| } |
There was a problem hiding this comment.
🔧 wrench — Same fail-open pattern that batch_sync was just hardened against in commit 8e08e642: take_body_bytes() reads the full body before the size check at line 320 fires. An authenticated partner with compromised credentials can stream a multi-megabyte body and pressure WASM heap before the 64 KiB cap rejects it.
Fix: pre-check Content-Length before reading, mirroring batch_sync.rs:174-178:
fn content_length_exceeds_limit(resp: &fastly::Response, max: usize) -> bool {
resp.get_header_str("content-length")
.and_then(|v| v.parse::<usize>().ok())
.is_some_and(|cl| cl > max)
}
// in extract_pull_uid:
if content_length_exceeds_limit(&response, MAX_PULL_RESPONSE_BYTES) {
log::warn!("Pull sync: partner '{}' content-length exceeds {} bytes", partner_id, MAX_PULL_RESPONSE_BYTES);
return None;
}
let body = response.take_body_bytes();| let kv_graph = if is_real_browser { | ||
| maybe_identity_graph(settings) | ||
| } else { | ||
| None | ||
| }; |
There was a problem hiding this comment.
🔧 wrench — Bot gate kills withdrawal tombstones (revisits the asymmetry I flagged on 2026-04-23).
kv_graph is None for bot-classified clients. This same kv_graph.as_ref() then flows into ec_finalize_response (line 232 and 427), where the withdrawal block in finalize.rs:62-79 is gated on if let Some(graph) = kv. Net result: a misclassified user-bot sending sec-gpc: 1 clears their cookie locally but the authoritative KV tombstone is never written. Within the 24-hour tombstone window, an authenticated batch-sync from a partner can repopulate that EC row, defeating revocation for the exact users (privacy-extension-heavy traffic) most likely to need it.
Fix: split the two uses — bot gate should suppress generation writes only; withdrawal writes should always reach the graph.
let kv_graph_full = maybe_identity_graph(settings);
let kv_graph_for_routing = if is_real_browser { kv_graph_full.as_ref() } else { None };
// pass kv_graph_full.as_ref() to ec_finalize_response unconditionally,
// pass kv_graph_for_routing to handlers that perform organic upserts.Alternatively, document explicitly that bot-classified withdrawals are best-effort cookie-only — and add an integration test that confirms the design choice.
| let Some(uid) = eid.uids.iter().find(|uid| !uid.id.is_empty()) else { | ||
| continue; | ||
| }; | ||
|
|
||
| if eid_id_exceeds_size_limit(&uid.id) { | ||
| log::debug!( | ||
| "Prebid EIDs: rejecting oversized uid for partner '{}' from source '{}'", | ||
| partner.id, | ||
| eid.source, | ||
| ); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
🔧 wrench — First-non-empty-then-size-check loses valid trailing UIDs.
find(|uid| !uid.id.is_empty()) picks the first non-empty UID; if that one then exceeds MAX_UID_LENGTH, the continue at line 123 abandons the entire source. A partner whose first UID is oversized (a probe / debug value) silently loses every following valid UID under that source.
Fix: filter on size at the same time so the loop falls through to the next valid UID.
let Some(uid) = eid.uids.iter()
.find(|u| !u.id.is_empty() && !eid_id_exceeds_size_limit(&u.id))
else { continue; };And drop the now-redundant size check at lines 117-124.
| let api_key_hash = hash_api_key(partner.api_token.expose()); | ||
|
|
||
| if by_api_key_hash.contains_key(&api_key_hash) { | ||
| return Err(Report::new(TrustedServerError::Configuration { | ||
| message: format!( | ||
| "ec.partners: partner '{}' has an API token that collides \ | ||
| with another partner's token hash", | ||
| partner.id | ||
| ), | ||
| })); | ||
| } |
There was a problem hiding this comment.
🔧 wrench — EcPartner.api_token accepts empty / arbitrarily short values.
Redacted::new("") flows through serde and lands here as the SHA-256 of empty string indexed in by_api_key_hash. Likewise the integration-test fixtures in trusted-server.toml use 18-byte tokens ("inttest-api-key-1"), well below the EC passphrase floor that this PR just raised to 32 bytes. This is the Bearer credential that gates both /_ts/api/v1/batch-sync and /_ts/api/v1/identify; minimum-length and non-empty validation should match the threat model used for the master EC passphrase.
Fix: add validation alongside validate_rate_limits (called at line 114):
fn validate_api_token(partner: &EcPartner) -> Result<(), Report<TrustedServerError>> {
let token = partner.api_token.expose();
if token.trim().is_empty() {
return Err(Report::new(TrustedServerError::Configuration {
message: format!("ec.partners: '{}' has an empty api_token", partner.id),
}));
}
const MIN_API_TOKEN_LENGTH: usize = 32;
if token.len() < MIN_API_TOKEN_LENGTH {
return Err(Report::new(TrustedServerError::Configuration {
message: format!("ec.partners: '{}' api_token must be at least {MIN_API_TOKEN_LENGTH} bytes", partner.id),
}));
}
Ok(())
}Then rotate the inttest* fixtures to ≥ 32-byte tokens.
| let mut response = json_response_with_origin(StatusCode::OK, &body, allowed_origin.as_deref())?; | ||
| response.set_header(HEADER_X_TS_EC, ec_id); | ||
|
|
||
| Ok(response) | ||
| } |
There was a problem hiding this comment.
🔧 wrench — /identify responses lack Cache-Control: no-store and have an incomplete Vary.
The 200 path returns per-partner PII (uid, eid, cluster_size) with Access-Control-Allow-Credentials: true and Vary: Origin. Authorization-bearing responses are not cached by well-behaved CDNs, but identity material warrants explicit headers as defense-in-depth — and the 401 / 403 / 204 paths set neither.
Fix: set unconditionally on every status path, both inside apply_cors_headers and on the no-CORS branches.
response.set_header("cache-control", "no-store, private");
response.set_header(header::VARY, "Origin, Authorization, Cookie");Add a test asserting the headers are present on each of the 200/204/401/403 paths.
| // Best-effort CAS write-back — update the network field. | ||
| let mut updated_entry = entry.clone(); | ||
| updated_entry.network = Some(KvNetwork { | ||
| cluster_size: Some(cluster_size), | ||
| }); |
There was a problem hiding this comment.
🤔 thinking — evaluate_cluster clones the entry and replaces network wholesale with Some(KvNetwork { cluster_size: Some(cluster_size) }). If KvNetwork gains additional fields later, this CAS write silently drops them. Either preserve existing fields when extending the type, or add a comment that network is owned solely by this function.
| let eids = match parse_prebid_eids_cookie(cookie_value) { | ||
| Ok(eids) => eids, | ||
| Err(err) => { | ||
| log::debug!("Prebid EIDs: failed to decode ts-eids cookie: {err}"); |
There was a problem hiding this comment.
⛏ nitpick — format!("...failed to decode ts-eids cookie: {err}") echoes parts of the cookie content via serde_json::Error's Display impl ("expected …, got … at column N"). Drop to trace! and use a coarse category — keeps debug logs free of attacker-controlled cookie fragments.
| #[serde(skip)] | ||
| pub consent: Option<crate::consent::ConsentContext>, | ||
| /// Consent-gated Extended User IDs resolved from the KV identity graph. | ||
| /// | ||
| /// Populated by the auction handler from partner data when the user has | ||
| /// a valid EC and consent permits EID transmission. `None` when no EIDs | ||
| /// are available (no EC, consent denied, or KV read failure). | ||
| #[serde(skip)] | ||
| pub eids: Option<Vec<crate::openrtb::Eid>>, |
There was a problem hiding this comment.
⛏ nitpick — consent and eids carry #[serde(skip)] on a struct that derives Deserialize. If anything ever round-trips this type through JSON (caching layer, debug snapshot, future trust boundary), the consent gating and EID set both silently evaporate. Either split into AuctionRequestWire (Deserialize) + AuctionRequest (constructed in-memory only), or document the constraint loudly.
| pub fn ec_finalize_response( | ||
| settings: &Settings, | ||
| ec_context: &EcContext, | ||
| kv: Option<&KvIdentityGraph>, | ||
| registry: &PartnerRegistry, | ||
| eids_cookie: Option<&str>, | ||
| sharedid_cookie: Option<&str>, | ||
| response: &mut Response, | ||
| ) { |
There was a problem hiding this comment.
⛏ nitpick — ec_finalize_response has 7 parameters — at the project's hard cap (CLAUDE.md: "never more than 7 arguments"). Any future addition forces a struct, so consider introducing EcFinalizeContext { settings, ec_context, kv, registry, eids_cookie, sharedid_cookie } now to reduce churn at the call sites.
| [[handlers]] | ||
| path = "^/admin" | ||
| path = "^/_ts/admin" | ||
| username = "admin" | ||
| password = "changeme" |
There was a problem hiding this comment.
🏕 camp site — Pre-existing password = "changeme", but this PR moves the path from ^/admin to ^/_ts/admin and the EC PR's admin endpoints (/_ts/admin/keys/*) now ride the same handler. Ec::PASSPHRASE_PLACEHOLDERS rejects placeholder passphrases at startup; add an analogous startup check that rejects password="changeme"|"password"|"admin" for handlers matching ^/_ts/admin* so a deploy with the default committed file fails closed instead of exposing key rotation behind a guessable Bearer.
Summary
Changes
crates/trusted-server-core/src/ec/(new module)mod.rs(lifecycle orchestration),generation.rs(HMAC-SHA256),cookies.rs(cookie read/write),consent.rs(consent gating),device.rs(device signal derivation, bot gate),kv.rs+kv_types.rs(KV identity graph with CAS),partner.rs(partner registry + admin),sync_pixel.rs(pixel sync),batch_sync.rs(S2S batch sync),pull_sync.rs(background pull sync),identify.rs(identity lookup with CORS),eids.rs(EID encoding),finalize.rs(response finalization middleware),admin.rs(admin endpoints)crates/integration-tests/tests/common/ec.rsandtests/frameworks/scenarios.rs; updatedintegration.rstest runner and Viceroy config fixturescrates/trusted-server-adapter-fastly/src/main.rs/_ts/api/v1/and/_ts/admin/;send_to_client()pattern with background pull sync dispatchcrates/trusted-server-core/src/auction/endpoints.rsandformats.rsupdated to decorate bid requests with partner EIDs from the KV identity graph (user.id,user.eids,user.consent)crates/trusted-server-core/src/integrations/prebid.rscrates/js/lib/src/integrations/prebid/index.tscrates/trusted-server-core/src/synthetic.rscrates/trusted-server-core/src/cookies.rsec/cookies.rscrates/trusted-server-core/src/settings.rs+settings_data.rscrates/trusted-server-core/src/consent/crates/trusted-server-core/src/http_util.rscrates/trusted-server-core/src/proxy.rscrates/trusted-server-core/src/publisher.rscrates/trusted-server-core/src/constants.rsdocs/guide/ec-setup-guide.md(new)docs/guide/edge-cookies.md(new)docs/guide/api-reference.mddocs/guide/configuration.mddocs/guide/synthetic-ids.mddocs/(various)docs/internal/superpowers/specs/2026-03-24-ssc-technical-spec-design.mdfastly.tomltrusted-server.tomlCloses
Closes #532
Closes #533
Closes #534
Closes #535
Closes #536
Closes #537
Closes #538
Closes #539
Closes #540
Closes #541
Closes #542
Closes #543
Closes #544
Closes #611
Closes #612
Follow-up
Test plan
cargo test --workspacecargo clippy --workspace --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest runcd crates/js/lib && npm run formatcd docs && npm run formatChecklist
unwrap()in production code — useexpect("should ...")tracingmacros (notprintln!)