Skip to content

NagaYu/capconnect

Repository files navigation

πŸ”’ CapConnect

A privacy-first, zero-knowledge data connector for the cookieless era.

Clean β†’ Hash (SHA-256) β†’ Forward. First-party customer data is normalized and irreversibly hashed in memory before it is relayed to Meta Conversions API and Google Ads Enhanced Conversions. No raw PII ever touches a disk, a log, or a database.

License: BSL-1.1 TypeScript PII on disk


Why CapConnect exists

Third-party cookies are gone. To keep measuring conversions, advertisers must send first-party data server-to-server. But doing that safely is the hard part: the moment raw emails and phone numbers leave your perimeter, your legal and security teams own a new liability.

CapConnect is the safe middle layer. It accepts your raw events, normalizes and SHA-256 hashes every personal field the instant it arrives, and forwards only those one-way digests to the ad platforms β€” exactly the way Meta and Google specify. Because the plaintext exists only transiently in process memory and is never persisted, there is nothing to leak from CapConnect's storage. There is no storage.

For CTOs & Legal: CapConnect is a stateless relay. It holds no datastore, writes no PII to disk, and emits no PII in logs. The only personal data it transmits is already a non-reversible SHA-256 hash, sent over TLS to Meta/Google β€” the same hashing the platforms' own SDKs perform client-side.


✨ Features

  • Zero-knowledge by construction β€” plaintext PII lives only on the call stack during a single request; it is GC-eligible the moment the response is returned.
  • Spec-accurate normalization β€” email lower/trim, phone β†’ E.164 (auto-prepends the country code, e.g. 81 for Japan), names/city/state/zip/DOB/gender all formatted per the official Meta & Google guidelines.
  • SHA-256 hashing with idempotent pass-through for upstream-pre-hashed inputs.
  • Meta Conversions API with deduplication event_id support to pair with the browser Pixel.
  • Google Ads Enhanced Conversions via uploadClickConversions with userIdentifiers.
  • Concurrent, fault-isolated dispatch β€” one platform failing never blocks the other.
  • Automatic retries with exponential backoff + jitter on transient errors (network / 429 / 5xx).
  • Automatic Google OAuth refresh (optional) β€” refresh-token exchange, in-memory caching, concurrent-refresh coalescing; nothing persisted.
  • Single & batch webhook endpoints (/v1/collect, /v1/collect/batch).
  • Hardened Express server β€” shared-secret auth (constant-time compare), body-size caps, x-powered-by disabled, graceful shutdown.
  • Strict TypeScript end to end (exactOptionalPropertyTypes, noUncheckedIndexedAccess, …).
  • 80 unit/integration tests (Vitest + Supertest), ESLint type-checked rules, multi-stage Docker image, and GitHub Actions CI.

πŸ›‘ The Zero-Knowledge Guarantee

flowchart LR
    A["Upstream source<br/>(Webhook / CSV parse)"] -->|HTTPS + shared secret| B["CapConnect<br/>/v1/collect"]

    subgraph MEM["⏱ In-memory only β€” never persisted, never logged"]
        direction TB
        C["Validate contract"] --> D["Normalize<br/>email Β· phone Β· name Β· address"]
        D --> E["SHA-256 hash<br/>(irreversible, one-way)"]
        E --> F["Build provider payloads<br/>(hashed digests only)"]
    end

    B --> C
    F -->|TLS| G["Meta Conversions API<br/>(event_id dedup)"]
    F -->|TLS| H["Google Ads API<br/>Enhanced Conversions"]

    G --> I["Structured, PII-free<br/>dispatch report"]
    H --> I
    I -->|JSON response| A

    classDef mem fill:#0b3d2e,stroke:#10b981,color:#e6fffa;
    class MEM,C,D,E,F mem;
Loading

What crosses each boundary:

Boundary Data in transit Form
Upstream β†’ CapConnect Raw identifiers + event metadata Plaintext over TLS (you control this hop)
Inside CapConnect Identifiers Normalized then SHA-256 hashed, in memory only
CapConnect β†’ Meta/Google Identifiers SHA-256 hex digests + un-hashed match-support fields (fbc/fbp/IP/UA/gclid) per platform spec
CapConnect β†’ Upstream Dispatch outcome No PII β€” only event_id, status, provider trace ids

πŸ— Architecture

flowchart TD
    subgraph SRC["src/"]
        SV["server.ts<br/>Express app Β· auth Β· validation Β· routing"]
        NM["utils/normalizer.ts<br/>normalize + SHA-256 (pure functions)"]
        CP["services/capi.ts<br/>Meta + Google payload build & POST"]
        TY["types/index.ts<br/>strict shared types"]
    end

    SV -->|"normalizeAndHash()"| NM
    SV -->|"dispatchToAllProviders()"| CP
    NM --- TY
    CP --- TY
    SV --- TY
Loading
File Responsibility
src/types/index.ts Single source of truth for every shape: raw input, hashed IR, provider payloads, responses, config.
src/utils/normalizer.ts Field-by-field normalization (Meta/Google spec) + SHA-256. Pure, no I/O, no logging.
src/utils/retry.ts Exponential-backoff retry helper; retries transient errors only.
src/services/capi.ts Builds & POSTs Meta CAPI and Google Ads payloads concurrently; PII-free error reporting.
src/services/googleAuth.ts Google OAuth token providers (static + auto-refreshing).
src/server.ts Hardened Express server, config loading/validation, webhook endpoints, graceful shutdown.
tests/ Vitest unit + Supertest integration suites (80 tests).

πŸš€ Quick start

# 1. Install
npm install

# 2. Configure
cp .env.example .env
#   …then edit .env with your Pixel id, access tokens, etc.
#   Generate a webhook secret:  openssl rand -hex 32

# 3. Develop (hot reload)
npm run dev

# 4. Production build & run
npm run build
npm start

Health check

curl -s http://localhost:3000/healthz | jq
{ "status": "ok", "service": "capconnect", "providers": { "meta": true, "google": true } }

πŸ“‘ API

POST /v1/collect

Authenticated with the x-capconnect-token header (must equal WEBHOOK_SECRET).

Request

curl -s -X POST http://localhost:3000/v1/collect \
  -H "Content-Type: application/json" \
  -H "x-capconnect-token: $WEBHOOK_SECRET" \
  -d '{
    "customer": {
      "email": "  John.Doe@Example.COM ",
      "phone": "090-1234-5678",
      "firstName": "John",
      "lastName": "Doe",
      "country": "JP",
      "city": "Tokyo",
      "zip": "100-0001",
      "externalId": "user_42"
    },
    "event": {
      "eventName": "Purchase",
      "eventId": "order-2025-0001",
      "value": 4980,
      "currency": "JPY",
      "orderId": "order-2025-0001",
      "actionSource": "website",
      "eventSourceUrl": "https://shop.example.com/thank-you",
      "fbc": "fb.1.1700000000000.IwAR...",
      "gclid": "Cj0KCQ..."
    }
  }'

The phone above is normalized to 819012345678 (leading 0 dropped, 81 prepended) and then SHA-256 hashed before it ever leaves the process.

Response (200 if at least one provider accepted; 422 if none did)

{
  "eventId": "order-2025-0001",
  "accepted": true,
  "results": [
    { "provider": "meta",   "status": "sent", "httpStatus": 200, "detail": "events_received=1 fbtrace_id=Aa...", "reference": "order-2025-0001" },
    { "provider": "google", "status": "sent", "httpStatus": 200, "detail": "results=1", "reference": "order-2025-0001" }
  ]
}

POST /v1/collect/batch

Same auth. Body is a JSON array of { customer, event } objects (max 1000). Each item is processed independently; invalid items are reported in validationErrors without failing the rest.

curl -s -X POST http://localhost:3000/v1/collect/batch \
  -H "Content-Type: application/json" \
  -H "x-capconnect-token: $WEBHOOK_SECRET" \
  -d '[ { "customer": { "email": "a@example.com" }, "event": { "eventName": "Lead" } } ]'

GET /healthz

Unauthenticated liveness/readiness probe. Returns no PII.


βš™οΈ Configuration

All configuration is via environment variables (see .env.example). The process fails fast at boot if an enabled provider is missing required secrets.

Server

Variable Required Default Description
PORT no 3000 HTTP listen port.
WEBHOOK_SECRET yes β€” Shared secret required in the x-capconnect-token header.
DEFAULT_COUNTRY_CODE no 81 Digits-only calling code auto-prepended to non-international phones.
DEFAULT_CURRENCY no JPY ISO-4217 currency applied when an event omits one.
HTTP_TIMEOUT_MS no 10000 Outbound request timeout for both providers.
RETRY_MAX_ATTEMPTS no 3 Max attempts per provider call (incl. the first).
RETRY_BASE_DELAY_MS no 300 Base backoff delay; doubles each retry, full-jittered.
RETRY_MAX_DELAY_MS no 5000 Hard cap on any single backoff delay.

Meta Conversions API

Variable Required Default Description
META_ENABLED no true Toggle Meta dispatch.
META_PIXEL_ID if enabled β€” Your Pixel / dataset id.
META_ACCESS_TOKEN if enabled β€” System-user access token.
META_API_VERSION no v20.0 Graph API version.
META_TEST_EVENT_CODE no β€” Enables Meta's Test Events tool when set.

Google Ads API (Enhanced Conversions)

Variable Required Default Description
GOOGLE_ENABLED no true Toggle Google dispatch.
GOOGLE_CUSTOMER_ID if enabled β€” Account id owning the conversion action (digits, no dashes).
GOOGLE_CONVERSION_ACTION_ID if enabled β€” Numeric conversion action id.
GOOGLE_DEVELOPER_TOKEN if enabled β€” Google Ads API developer token.
GOOGLE_OAUTH_ACCESS_TOKEN auth A β€” Pre-issued OAuth2 Bearer token (you refresh).
GOOGLE_OAUTH_CLIENT_ID auth B β€” OAuth client id for automatic refresh.
GOOGLE_OAUTH_CLIENT_SECRET auth B β€” OAuth client secret for automatic refresh.
GOOGLE_OAUTH_REFRESH_TOKEN auth B β€” OAuth refresh token (held in memory only).
GOOGLE_OAUTH_TOKEN_URI no https://oauth2.googleapis.com/token Token endpoint.
GOOGLE_LOGIN_CUSTOMER_ID no β€” MCC / manager id for login-customer-id.
GOOGLE_API_VERSION no v17 Google Ads API version.
GOOGLE_VALIDATE_ONLY no false Validate without recording a conversion.

OAuth β€” two strategies (provide exactly one): (A) Inject a short-lived GOOGLE_OAUTH_ACCESS_TOKEN from your own secret manager and handle refresh externally; or (B) provide GOOGLE_OAUTH_CLIENT_ID + GOOGLE_OAUTH_CLIENT_SECRET + GOOGLE_OAUTH_REFRESH_TOKEN and CapConnect refreshes access tokens automatically β€” caching them in memory, coalescing concurrent refreshes, and renewing ~60s before expiry. The refresh token is never persisted or logged. Boot fails fast if neither strategy is fully configured while Google is enabled.

Resilience

  • Automatic retries with exponential backoff + full jitter on transient failures only (network errors, HTTP 429, HTTP 5xx). Deterministic 4xx errors are never retried. Tune via RETRY_*.
  • Fault isolation β€” a Meta failure never blocks Google (and vice-versa); a Google token-refresh failure is reported as a Google-only auth: error while Meta still dispatches.

πŸ” Normalization rules (Meta/Google compliant)

Field Rule Example β†’ normalized
Email (em) trim, lowercase (no dot-stripping) A.B@Ex.COM β†’ a.b@ex.com
Phone (ph) digits only β†’ E.164 (drop trunk 0, prepend country code) 090-1234-5678 β†’ 819012345678
First/Last name (fn/ln) lowercase, strip punctuation, collapse spaces O'Brien β†’ obrien
City (ct) lowercase, remove all whitespace & punctuation New York β†’ newyork
State (st) lowercase, letters only CA β†’ ca
Zip (zp) lowercase, strip spaces, drop +4, US→first 5 100-0001 → 100
Country (country) ISO alpha-2, lowercase Japan→ first 2 letters
DOB (db) parse β†’ YYYYMMDD 1990/01/02 β†’ 19900102
Gender (ge) β†’ m / f Male β†’ m

Every normalized value is then SHA-256 hashed to lowercase hex before dispatch. Empty/unusable values are dropped (never hashed to the all-empty digest).


πŸ§ͺ Local verification

npm run typecheck     # strict TS, no emit
npm run lint          # ESLint (type-checked rules)
npm run build         # compile to dist/
npm test              # run the Vitest suite (80 tests)
npm run test:coverage # tests + coverage report
npm run ci            # typecheck + lint + build + test (what CI runs)

You can exercise the pipeline safely against Meta's Test Events tool by setting META_TEST_EVENT_CODE, and against Google by setting GOOGLE_VALIDATE_ONLY=true.

🐳 Docker

# Build & run with compose (reads your .env)
cp .env.example .env   # then fill in secrets
docker compose up --build

# …or build the image directly
docker build -t capconnect:latest .
docker run --rm -p 3000:3000 --env-file .env capconnect:latest

The image is a multi-stage build that ships only production dependencies and the compiled dist/, runs as the non-root node user, exposes a HEALTHCHECK against /healthz, and (via compose) runs read-only with dropped capabilities and no-new-privileges.

πŸ” CI

.github/workflows/ci.yml runs on every push/PR: typecheck β†’ lint β†’ build β†’ test across Node 18/20/22, a coverage job, and a Docker build job.


🀝 Operational guidance

  • Terminate TLS in front of CapConnect (reverse proxy / load balancer) and keep the upstream hop encrypted.
  • Rotate WEBHOOK_SECRET and provider tokens via your secret manager; never bake them into images.
  • Do not enable request-body logging at the proxy for /v1/collect* β€” that would defeat the zero-knowledge property outside the app.
  • Scale horizontally β€” the service is stateless, so run as many replicas as you need behind a load balancer.

πŸ“„ License

CapConnect is distributed under the Business Source License 1.1 (BSL-1.1).

  • You may use, copy, modify, and self-host CapConnect freely, including internally at your company.
  • The Additional Use Grant permits production use except offering CapConnect (or a substantially similar service) to third parties as a hosted/managed commercial "conversion-relay" product.
  • On the Change Date (four years after each version's release), that version automatically converts to the Apache License 2.0.
Licensed under the Business Source License 1.1 (the "License");
you may not use this file except in compliance with the License.
Change License: Apache License, Version 2.0
Change Date: four (4) years from the date of each release.
Additional Use Grant: You may use the Licensed Work in production, except to
provide it to third parties as a hosted or managed commercial conversion-relay
service that competes with the Licensor's offering.

THE LICENSED WORK IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.

Prefer a copyleft model instead? CapConnect is also available under the GNU AGPL-3.0 on request β€” contact the maintainers. Choose BSL-1.1 for permissive self-hosting with a commercial-SaaS carve-out, or AGPL-3.0 if you want network-use copyleft obligations.


Built for the cookieless era. Your customers' data stays your customers'.

About

Privacy-first, zero-knowledge data connector: cleans & SHA-256 hashes first-party customer data in memory, then relays to Meta Conversions API & Google Ads Enhanced Conversions. No PII persisted.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors