DOC-2262: Add OAuth authentication to the docs MCP server (Redpanda Cloud IdP) by JakeSCahill · Pull Request #181 · redpanda-data/docs-site

JakeSCahill · 2026-06-15T10:50:31Z

Goal

Add authentication to the docs MCP server (docs.redpanda.com/mcp) so AI tools (ChatGPT, Claude, Cursor, VS Code) have users sign in with their Redpanda Cloud account, letting us capture verified work emails and attribute docs usage to organizations.

Architecture (decided with Cloud Identity)

The docs service runs its own OAuth 2.1 Authorization Server (AS), with the Cloud IdP (Auth0) as the upstream identity provider. AI tools register and authenticate against our AS; we federate the human login to Auth0 and issue our own tokens.

Why this shape:

ChatGPT only supports spec OAuth (no static tokens), so OAuth is required.
The Cloud Auth0 tenant has DCR and CIMD disabled (tested — see comments), so AI tools can't register with it directly. Brokering means the Cloud IdP only ever sees one client (ours), and client self-registration happens on our side where we control it.
We get the user's verified email/org from Auth0 to capture + attribute.

Division of responsibilities

Cloud / Identity: one Auth0 public client (client_id, Authorization Code + PKCE, no secret), our /callback redirect URIs allow-listed, ID token returns email/email_verified + org. One app covers MCP now and docs-site login later.
Us: the AS — /authorize, /callback, /token (+ refresh w/ rotation), client registration (DCR + CIMD), JWKS, consent/login UI; federate login to Auth0; issue/validate our own tokens. One-time-use state lives in Neon Postgres; clients/rate-limit counters stay on Netlify Blobs (see below).

Also in this PR: OAuth state on Neon Postgres

The AS's one-time-use / transactional state (auth requests, auth codes, refresh tokens + families) is backed by Neon Postgres (Netlify DB), behind a STORE_BACKEND flag (blobs default, neon to switch).

Why: Netlify Blobs has no compare-and-swap, so one-time-use was read-then-delete/mark. Two concurrent refreshes could both rotate without tripping reuse detection (theft signal). Postgres makes each consume a single atomic UPDATE … WHERE used=false RETURNING * — exactly one wins, the loser is treated as reuse.
Schema auto-applies on deploy (netlify/database/migrations/), including to per-preview DB branches. A daily scheduled function GCs expired rows.
DCR clients stay on Blobs (plain persistence, no atomicity benefit).
Rollout: unset = Blobs (no change); STORE_BACKEND=neon on a context to switch; roll back by resetting the var. Cutover note: flipping blobs→neon doesn't migrate rows, so live refresh tokens won't carry over — users re-authenticate once.

Also in this PR: documentation feedback tool

An MCP tool, submit_documentation_feedback, lets AI clients forward user feedback (bugs, doc gaps, incorrect/missing info, feature requests) straight to the docs/DX team. The tool description tells the agent to ask the user before submitting and include the relevant page/context.

Lands in the existing api-feedback Netlify form (MCP-only); fields trimmed to what matters: feedback, category, page-path, user-email, user-domain (+ honeypot).
When signed in, the user's email + domain are attached so the team can follow up; anonymous otherwise. Logs only category/domain/authed — never the raw email or feedback text.
Posts to a non-redirecting page (the site root 301s, which silently dropped submissions) and uses redirect: 'error' so a redirect can't masquerade as success.
Surfaced in the server card + server.json descriptions.

Login page

The interstitial shown before the Cloud redirect is styled to match docs.redpanda.com (Inter, the Redpanda logo, brand palette — all same-origin so it renders on prod and previews), discloses what we collect with a Privacy Policy link, and links "Create a free account" (new tab) for prospects without a Cloud account.

Future (phase 2)

The same Auth0 federation core also powers human login to the docs site — it just sets a browser session instead of issuing tokens. One Auth0 app, two consumers; MCP ships first. The Neon schema is intentionally compatible with future identity/saved-conversation tables.

Testing

2026-06-18_19-50-10.mp4

The deploy preview is wired to the integration Auth0 tenant and runs the Neon backend (STORE_BACKEND=neon), so you can exercise the full flow there.

1. Add the preview server to Claude Code:

claude mcp add --scope local --transport http redpanda-preview https://deploy-preview-181--redpanda-documentation.netlify.app/mcp

2. Authenticate and use it:

Run /mcp, select redpanda-preview, choose Authenticate
A browser opens → "Continue with Redpanda Cloud" → sign in on the integration tenant → pick an org → it connects (auth code + refresh token are written to Neon)
Ask something that hits a tool, e.g. "Using redpanda-preview, list the Redpanda API reference pages."
Try the feedback tool, e.g. "Send feedback to the Redpanda team that the quickstart is missing a step." → check Netlify → Forms → api-feedback
Clean up with claude mcp remove redpanda-preview

Notes:

You need an account on the integration tenant (integration-cloudv2.us.auth0.com), not prod.
Unit tests: npm run test:mcp plus the tests/mcp-oauth-*.test.ts suites. The Neon atomicity tests (tests/mcp-oauth-neon.test.ts) run against a real Postgres when TEST_NEON_URL is set (skipped otherwise).

Kapa now has your conversation along with your email and org

History

Explored a pure OAuth resource server pointing at the Cloud IdP (blocked: DCR/CIMD disabled on that tenant), landing on the AS-broker design above after the call with Cloud. The Neon state backend was developed as PR #184 and merged into this branch.

netlify · 2026-06-15T10:50:38Z

✅ Deploy Preview for redpanda-documentation ready!

Name	Link
🔨 Latest commit	`7e47e01`
🔍 Latest deploy log	https://app.netlify.com/projects/redpanda-documentation/deploys/6a398a84e9c139000895197b
😎 Deploy Preview	https://deploy-preview-181--redpanda-documentation.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
Lighthouse	1 paths audited Performance: 69 (🟢 up 6 from production) Accessibility: 92 (🔴 down 2 from production) Best Practices: 92 (no change from production) SEO: 83 (no change from production) PWA: - View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

Add a lightweight email->token authentication gate to the docs MCP server to capture users' work email addresses for lead capture and usage attribution. - New /mcp/register endpoint: users submit a work email and the bearer token is delivered ONLY by email (never in the HTTP response), so possession of a working token proves the address is real and owned. - Mandatory 4-layer validation: format, work-domain filter (reject free/ disposable providers), MX-record check, email delivery. - Tokens stored hashed in Netlify Blobs; auth middleware in mcp.mjs threads the authenticated email/domain to Kapa via _meta.user for attribution. - Bearer header and ?token= query fallback (for clients that can't set headers). - Gated behind REQUIRE_AUTH (grace period -> enforce); per-token rate limiting. - Captured emails -> Netlify Blobs + logs + optional CRM_WEBHOOK_URL forward. - Docs: registration + per-client setup + privacy/consent note; server-card and server.json advertise the token requirement. - 17 unit tests (tests/mcp-auth.test.ts). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Netlify Functions don't reliably set NODE_ENV=production at runtime, so the previous NODE_ENV-based dev bypass could fire in deployed environments — silently logging tokens instead of emailing them and not failing when RESEND_API_KEY is missing. Gate the bypass on NETLIFY_DEV (set only by `netlify dev`/`functions:serve`) so any deployed env without a key errors loudly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Replace the custom email->token gate with a standard MCP OAuth 2.1 resource server delegating to the Redpanda Cloud IdP (auth.prd.cloud.redpanda.com). This is required so ChatGPT can authenticate (ChatGPT only supports spec OAuth, not static tokens), while still capturing users' verified work emails. Verified the Cloud IdP supports everything needed (open Dynamic Client Registration, CIMD, PKCE S256, public clients, email scope, userinfo). - /.well-known/oauth-protected-resource (RFC 9728) edge function advertises the Cloud IdP as the authorization server; clients self-register via DCR/CIMD. - mcp.mjs auth middleware validates the bearer token against the IdP /userinfo endpoint, extracts the verified email/org, captures it (Blobs + log + optional CRM_WEBHOOK_URL), and threads it to Kapa via _meta.user. - Optional work-email enforcement (REQUIRE_WORK_EMAIL, default on) returns 403 for personal providers; REQUIRE_AUTH keeps the grace->enforce rollout. - Remove the email->token registration endpoint and email-sending module. - Docs updated: clients prompt for Redpanda Cloud sign-in (no token to paste). - Unit tests rewritten for the OAuth logic (16 tests). Production hardening (needs identity team): register an Auth0 API for the MCP resource so tokens are audience-bound JWTs, and add email as an access-token claim. Until then we validate via /userinfo (no audience binding). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

JakeSCahill · 2026-06-15T12:50:56Z

⛔ Blocked: identity team — Dynamic Client Registration is disabled on the Cloud IdP

The OAuth resource server, discovery, and token validation are implemented and verified, but end-to-end auth cannot work yet because MCP clients can't register with the Redpanda Cloud IdP.

What works (verified against prd `auth.prd.cloud.redpanda.com`)

Protected-resource metadata (/.well-known/oauth-protected-resource) + authorization-server discovery
PKCE (S256), email / email_verified scopes
Token validation via /userinfo
Our server's 401 → WWW-Authenticate → metadata chain (confirmed on the preview)

The blocker

MCP clients (ChatGPT, Claude, Cursor, VS Code) have no pre-registered client_id — they self-register at runtime via Dynamic Client Registration. The Cloud IdP rejects this:

POST https://auth.prd.cloud.redpanda.com/oidc/register
→ 400 {"error":"Bad Request","message":"dynamic client registration is disabled"}

Confirmed with a valid registration body. Reproduced in Claude Code, which fails at exactly this step:

No client info found
SDK auth error: L7H
Error during auth completion: SDK auth failed

(Failure happens before the browser opens — i.e. at client registration, not at user login.)

Needed from the identity team (DOC-2262)

Enable Dynamic Client Registration for public clients (authorization_code + PKCE, token_endpoint_auth_method: none, client-supplied loopback/https redirect URIs) — and/or confirm CIMD (Client ID Metadata Documents) is actually enabled. The discovery doc advertises client_id_metadata_document_supported: true, but DCR was advertised too and is disabled, so advertised ≠ enabled. CIMD is what ChatGPT prefers.
(Hardening) Register an Auth0 API/resource with audience https://docs.redpanda.com/mcp and add email, email_verified, and org (org_id/org_name) as access-token claims → lets us validate audience-bound JWTs instead of /userinfo.
Confirm whether to enable on prd or a staging tenant first.

Status

Code is complete and verified up to the registration step; keeping this PR in draft until the IdP is configured.
Tracking the identity-team request in DOC-2262.
Recommend leaving REQUIRE_AUTH in grace (unenforced) on deployed environments until DCR/CIMD is enabled, so the server isn't gated for clients that can't yet authenticate.

Drop the unused OAUTH_ISSUER export from idp.mjs and de-export the FREE_EMAIL_DOMAINS / DISPOSABLE_DOMAINS sets (used only internally). No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The probe confirmed CIMD is not enabled on the Cloud IdP (a valid client metadata document used as client_id still returns 'Unknown client'). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

JakeSCahill · 2026-06-16T13:04:53Z

Update — call with Cloud Identity (Santi): implementation direction decided

Architecture: We unblock MCP auth by having the docs service run its own OAuth 2.1 Authorization Server (AS), with the Cloud IdP (Auth0) as the upstream identity provider. This sidesteps the earlier blocker (the Cloud Auth0 tenant has both DCR and CIMD disabled — see prior comments): the AI tools register and authenticate against our AS, not the Cloud IdP directly. The Cloud IdP only ever sees one client — ours.

Division of responsibilities

Cloud / Identity (Santi): provide one Auth0 public client (client_id, Authorization Code + PKCE/S256, token_endpoint_auth_method=none, no secret), with our /callback redirect URIs allow-listed, and the ID token returning email, email_verified, and org (org_id/org_name). One app covers MCP now and docs-site login later.
Docs (us): implement the OAuth 2.1 AS — /authorize, /callback, /token (+ refresh with rotation), client registration (DCR + CIMD) for the AI tools, JWKS, and the consent/login UI; federate the human login to Auth0; issue our own tokens to MCP clients. State in Netlify DB (Neon Postgres).

Flow: AI client → our AS (/authorize, PKCE) → redirect to Auth0 login → our /callback (exchange code, read verified email/org) → we mint our own token → client calls /mcp with it → we validate + attribute usage to the user/org.

Open asks to Santi (in flight):

Confirm public client + exact /callback allow-listing + email/org claims.
Which tenant for dev vs prod (auth.prd.cloud.redpanda.com is prod) + a test user or two.

Future (phase 2): the same Auth0 federation core also powers human login to the docs site — it just sets a browser session instead of issuing tokens. One Auth0 app, two consumers; MCP ships first.

Next steps:

Spike the AS slice (/authorize → Auth0 → /callback → issue + validate a JWT) to confirm Netlify DB + serverless fit and that the Auth0 client works end-to-end.
Phased build: AS core → client registration (DCR/CIMD) → refresh + hardening → swap the resource-server to validate our own tokens → docs + rollout (REQUIRE_AUTH grace → enforce).
Rescope PR DOC-2262: Add OAuth authentication to the docs MCP server (Redpanda Cloud IdP) #181 from "resource server pointing at the Cloud IdP" to "we run the AS, Auth0 upstream."

… (M1) Replace the superseded resource-server-pointing-at-Cloud approach with the agreed broker architecture: our service is the OAuth 2.1 Authorization Server, federating the human login upstream to Auth0 and issuing/validating its own tokens. Ports the validated spike to production shape. Added (Milestone 1 — AS core): - lib/oauth/keys.mjs — jose RS256 sign/verify + JWKS; key from env (MCP_OAUTH_SIGNING_JWK) or dev-generated + persisted in Blobs (the spike proved an in-memory key breaks the flow) - lib/oauth/store.mjs — auth requests + auth codes on Netlify Blobs (interface is the seam for a Netlify DB/Neon backend when relational queries are needed) - lib/oauth/pkce.mjs, config.mjs, upstream.mjs (Auth0 + dev mock federation, id_token validated against Auth0 JWKS) - mcp-oauth.mjs — AS endpoints: discovery (RFC 8414), JWKS, /authorize, /mcp/callback, /token (authorization_code + PKCE) Changed: - mcp.mjs resource server now validates OUR OWN access tokens (jose) instead of calling the upstream /userinfo - protected-resource metadata + server card point authorization_servers at us - removed lib/idp.mjs (superseded /userinfo validation) Deferred (clearly marked): DCR/CIMD client registration (M2), refresh_token grant + rotation (M3), consent UI, revocation. Neon backend is a documented swap behind the store interface (needs Netlify DB provisioning). Auth0 mode needs Santi's client_id; defaults to a dev mock until then. Tests: 22 pass (PKCE incl. RFC 7636 vector; JWT issue/verify; JWKS leaks no private key; wrong-audience/tampered rejected). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

JakeSCahill · 2026-06-16T15:25:23Z

Milestone 1 landed — production AS scaffold (jose + storage), federating to Auth0

Pivoted the branch from the superseded resource-server-pointing-at-Cloud approach to the agreed broker architecture: our service is the OAuth 2.1 Authorization Server; it federates the human login upstream to Auth0 and issues/validates its own tokens. The validated spike is now ported to production shape.

In this milestone

lib/oauth/keys.mjs — jose RS256 sign/verify + JWKS. Key from env (MCP_OAUTH_SIGNING_JWK) or dev-generated + persisted in Blobs (the spike proved an in-memory key breaks the flow).
lib/oauth/store.mjs — auth-requests + auth-codes on Netlify Blobs; the interface is the seam for a Netlify DB / Neon backend when we want relational queries (swap behind the interface; needs DB provisioning).
lib/oauth/{pkce,config,upstream}.mjs — PKCE (S256), config, and Auth0 federation (id_token validated against Auth0 JWKS) with a dev-mock fallback.
mcp-oauth.mjs — AS endpoints: discovery (RFC 8414), JWKS, /authorize, /mcp/callback, /token (authorization_code + PKCE).
mcp.mjs resource server now validates our own access tokens (jose) instead of calling the upstream /userinfo; protected-resource metadata + server card point authorization_servers at us; removed lib/idp.mjs.

Tests: 22 pass (PKCE incl. the RFC 7636 vector; JWT issue/verify; JWKS leaks no private key; wrong-audience/tampered rejected).

Deferred (clearly marked in code): DCR/CIMD client registration (M2), refresh-token grant + rotation (M3), consent UI, revocation.

Still gated on:

Santi: the Auth0 client_id (public client + /mcp/callback allow-listed + email/org claims). Until then upstream defaults to a dev mock; flip with REDPANDA_OAUTH_CLIENT_ID + MCP_OAUTH_UPSTREAM=auth0.
Netlify DB provisioning if/when we move auth state off Blobs.

The flow itself was already validated end-to-end on Netlify Functions in the spike branch (spike/mcp-oauth-as).

The dev mock issues canned identities, so it must never be reachable by accident in a deployed environment. Resolve the upstream mode fail-closed: mock is only allowed under an explicit dev signal (NETLIFY_DEV or MCP_OAUTH_ALLOW_MOCK=true). Anything that would otherwise silently fall back to mock (e.g. a prod deploy missing REDPANDA_OAUTH_CLIENT_ID) resolves to null, and the AS returns 503 on the flow endpoints instead of handing out mock tokens. Discovery + JWKS stay up. - config.mjs: resolveUpstreamMode() (pure, tested) + UPSTREAM_MISCONFIGURED - upstream.mjs: throw if neither auth0 nor mock is active - mcp-oauth.mjs: 503 on /authorize, /callback, /token, mock-idp when misconfigured - tests: 6 cases covering the resolution matrix (28 total pass) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Netlify statically analyzes config.path at bundle time, so it can't be an array of imported constants (PATHS.*) — that failed bundling (and the PR preview build) with 'path: Must be a string or array of strings'. Use literal paths. Verified the full M1 flow live (functions:serve, mock upstream): authorize -> mock-idp -> /mcp/callback -> /token -> AS-issued JWT, then /mcp accepts that token (200) and rejects no-token / garbage (401). Confirms cross-function token validation via the Blobs-shared signing key. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Netlify Blobs defaults to eventual consistency (deletes/updates propagate up to 60s). For one-time-use auth codes and refresh-token rotation/reuse-detection that window would let a consumed code/token be replayed, so the auth store now uses { consistency: 'strong' }. The dev signing-key store does too, so the resource server reads the key the AS just wrote rather than regenerating. Verified live (functions:serve): full flow issues a token, /mcp accepts it (200), and replaying a consumed auth code is rejected (400). Note: Blobs still has no atomic CAS, so a sub-second concurrent replay remains theoretically possible — negligible at our volume; a relational DB is the only full fix (documented as the future swap behind the store interface). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add a 'Staying signed in' subsection explaining that the MCP client handles token renewal automatically: sign in once, the client refreshes in the background, active users stay signed in, and 30 days idle triggers a fresh (usually silent) sign-in. No manual token handling.

JakeSCahill · 2026-06-20T07:57:54Z

After thinking through the storage options for the OAuth layer, I’m leaning toward using Neon Postgres (via Netlify’s integration) as the system of record for auth state, rather than Netlify Blobs.
Even though this MCP is currently lightweight (public docs + agent access), the direction we’re heading in includes a shared authentication system for both the docs site and a chatbot with saved conversations. That introduces more durable identity state (sessions, refresh tokens, conversation links), which benefits from strong consistency and transactional guarantees.

Postgres gives us:

safe single-use semantics for authorization codes (transactional “check + consume”)
clean modeling for users, sessions, orgs, and conversations
a natural path to unify docs + chatbot identity under one system

Netlify Blobs feels fine for simple metadata or low-risk storage, but it doesn’t provide strong enough guarantees for OAuth flows where race conditions or replay could become an issue.
Given our current scale (~1k users/week), Neon is more than sufficient and keeps the system simple while leaving room for future expansion into persistent user identity and chat history.

Add an MCP tool that lets AI clients forward user feedback (bugs, doc gaps, frustrations, feature requests) straight to the Redpanda team. The tool description tells agents to ask the user before submitting and to include the relevant page/context. Feedback goes to the existing api-feedback Netlify form (the same store our docs feedback uses); the hidden form is extended with category, source, and user identity fields. When the user is signed in we attach their email + domain so the team can follow up; anonymous otherwise. We log only category/domain/authed, never the raw email or feedback text. Also bumps the server version and documents the capability for users.

Mention the feedback tool in the MCP server-card and server.json descriptions, and bump the server-card version to 1.3.0 to match.

The feedback tool POSTed to the site root, which 301-redirects to /home/. fetch followed the redirect (POST -> GET, body dropped), so Netlify Forms never recorded the submission — but the final 200 made the tool report success. Verified: POST to / => 301; POST to /home/ => 200 and the submission lands. POST to /home/ (configurable via MCP_FEEDBACK_FORM_PATH) and set redirect: 'error' so a redirect surfaces as a failure instead of a false success.

Use the docs site's real brand assets (served same-origin, so they load on prod and previews): Inter font, the Redpanda logo, and the exact brand palette (brand-600 #e24328 / brand-700 hover, body/faint text, neutral borders). Cleaner card layout, favicon, and a divider before the privacy note. Also nudge prospects: the signup link now reads 'Create a free account' instead of 'Sign up at cloud.redpanda.com'.

…BACKEND flag (#184) * Refactor OAuth store into a backend selector (Blobs default) Split the OAuth state store into pluggable backends behind store.mjs: - db/blobs.mjs: the current Netlify Blobs implementation (extracted, no behavior change), default backend. - db/neon.mjs: a Neon Postgres backend whose one-time-use consumes are atomic single statements (UPDATE/DELETE ... RETURNING), closing the read-then-delete race Blobs can't (no compare-and-swap). - store.mjs: thin selector by STORE_BACKEND (default blobs); DCR clients stay on Blobs (plain persistence, no atomicity benefit). Replaces the non-atomic markRefreshUsed with consumeRefresh: on Neon only one of two concurrent refreshes wins the row; the loser is treated as reuse and the family is revoked, restoring theft detection under races. Neon driver is imported lazily so the default path needs no DB or dep. No caller behavior changes on the default backend; all 56 tests pass. * Add Neon schema, scheduled cleanup, and atomicity tests - Migration SQL (db/migrations/0001_oauth_state.sql) for the four one-time-use/transactional tables, with expires_at indexes. - cleanupExpired() + a daily scheduled function (oauth-cleanup.mjs) that deletes expired requests/codes and past-expiry refresh tokens, then sweeps empty families. No-ops unless STORE_BACKEND=neon. Bounds growth. - @neondatabase/serverless dependency (HTTP driver; no Drizzle — the atomic ops are single hand-written statements). - Real-Postgres concurrency tests (tests/mcp-oauth-neon.test.ts), skipped unless TEST_NEON_URL is set: prove two concurrent auth-code consumes / refresh rotations yield exactly one winner, and cleanup removes expired rows. A fake can't prove atomicity, so these require a real DB. 56 tests pass; 3 Neon tests skip without a DB URL. * Align Neon store with Netlify DB (db init) conventions The database is provisioned and attached to the redpanda-documentation site. Wire the code to Netlify's managed flow: - Use @netlify/database (the package db init installed) instead of the raw @neondatabase/serverless driver: neon.mjs now connects via getDatabase().httpClient (zero-config, reads NETLIFY_DATABASE_URL, fail-closed if absent). - Move the schema into Netlify's auto-applied migrations directory (netlify/database/migrations/), so it's applied on deploy — including to per-preview DB branches. Removes the hand-rolled migrations path. - Update the atomicity test to the new path + @netlify/database client. 56 tests pass; 3 Neon tests skip without TEST_NEON_URL. * chore: trigger deploy-preview build (pick up STORE_BACKEND=neon)

Co-authored-by: Jake Cahill <45230295+JakeSCahill@users.noreply.github.com>

The api-feedback form is MCP-only, so several fields were dead weight: drop referer (duplicated page-path), source (constant 'mcp'), user-agent (constant; Netlify auto-captures the real UA), timestamp (Netlify records created_at), and user-sub (opaque id; email is the actionable identity). Kept: feedback, category, page-path, user-email, user-domain, plus the bot-field honeypot.

Add an identify hook to the MCPcat track() call so usage shows per-user/ per-org instead of anonymous sessions. Identity comes from our verified OAuth context (extra.authInfo), not a tool argument; returns null when unauthenticated so grace-period sessions stay anonymous. Forwards sub + email + domain (email included for now, pending legal sign-off before launch). Update the login interstitial and docs notice to disclose sharing with service providers / analytics systems.

MCPcat wraps the SDK's tools/list and tools/call handlers, which the MCP SDK only creates lazily on the first registerTool(). track() was being called before any tool was registered, so it found no handlers to wrap and silently no-op'd the tool-call-context injection — every event showed 'no user intent provided'. Move the track() call to after all tool registrations so the context parameter is injected and agent intent is captured. (identify hook unchanged.)

Steer the agent-intent prompt: ask for a concise third-person summary of what the user is trying to accomplish, and explicitly exclude credentials, tokens, personal data, and verbatim secrets from that free-text field.

# Conflicts: # netlify/functions/mcp.mjs

micheleRP

Review — security + functional, with live preview testing

Reviewed the full OAuth core, storage layer, resource-server integration, and the feedback tool (skipped the lockfile), and tested the deploy preview end-to-end. This is a high-quality, security-conscious implementation — I found no auth-bypass, token-forgery, or SQL-injection issues. Testing confirmed the auth gate, the feedback tool, and Kapa attribution all work, and surfaced one gap to resolve before relying on form-based attribution (finding #1). Everything below is non-blocking; not an approval — just findings + open questions.

Tested on the deploy preview

✅ Auth enforced (REQUIRE_AUTH=true): unauthenticated POST /mcp (initialize, tools/list) → spec-correct 401 with WWW-Authenticate: Bearer … resource_metadata=…, no-store/noindex.
✅ Feedback routing fix: GET / → 301 → /home/, GET /home/ → 200 — confirms the body-dropping bug and the non-redirecting target.
✅ Token endpoint rejects a bogus authorization code → invalid_grant.
✅ Feedback tool end-to-end: signed in via the Cloud integration tenant, called submit_documentation_feedback, and the submission landed in api-feedback with the correct feedback text + page-path.
✅ Kapa attribution confirmed: the conversation surfaced in Kapa's Users view tied to a real email + company (vs anonymous UUIDs for pre-auth users) — email + domain attach correctly at runtime.
⚠️ But the attribution fields didn't land in the Netlify form: the recorded submission shows only page-path + feedback — user-email, user-domain, and category are missing, despite being signed in. (See finding #1.)

⚠️ Finding #1 — attribution fields aren't captured by the `api-feedback` form yet (caught in testing)

The branch declares user-email/user-domain/category in api-feedback-registration.html:15–20 and the code POSTs them, so this isn't a code defect — the existing prod api-feedback form's field schema predates these fields, and Netlify drops unrecognized fields on deploy-preview submissions. Since lead attribution is the core goal of this PR, please confirm after merge that the new fields are captured once the updated registration deploys to production (Netlify may need the form re-detected; worst case, recreated).

Note: the same authenticated identity did reach Kapa (email + company both surfaced there), so the tool is attaching identity correctly — the missing fields on the Netlify form are confirmed to be the form-schema/detection issue alone, not a runtime/auth problem.

Other findings (non-blocking)

2. Consent screen deferred → authorization-code phishing can disclose a victim's verified email/org (medium). Open DCR + no consent screen (the interstitial doesn't name the requesting client/scopes) lets an attacker register a client with their own redirect_uri, phish a victim into /authorize, and exchange the resulting code for a token carrying the victim's email + org. Docs-scoped and needs social engineering, but it's the exact PII this system protects. Recommend landing consent before REQUIRE_AUTH=true/announcement, or interim-restricting DCR to an allow-list.

3. Reuse-detection only holds on Neon. Default STORE_BACKEND=blobs is non-atomic (documented). Ensure prod runs neon before relying on theft detection — worth a rollout-checklist line. (Preview already runs neon.)

4. PII at rest (for the pending legal/privacy review). Verified emails stored plaintext in the mcp-users Blobs store + Neon user_data, and forwarded to MCPcat, Kapa, and the CRM webhook. Kapa additionally retains email + domain + full conversation history tied to identity (visible in its Users view). No code defect — confirming the data footprint.

Minor: submitFeedback returns internal error text to the agent (detail: msg); DCR redirect_uris aren't scheme-restricted; stale auth.mjs comment references a non-existent idp.mjs; feedback page_url isn't validated (informational only).

Verified in code

RS256 tokens with iss/aud/alg pinned on verify (validates our token, not the upstream); Auth0 ID token verified against JWKS with iss/aud/RS256 pinned; PKCE S256 on both legs; redirect_uri validated before any redirect; thorough CIMD SSRF guard (https-only, private-IP blocklist v4+v6, no redirect-following, byte cap, timeout); refresh rotation + reuse detection + hashed-at-rest tokens + atomic Neon consume; fully parameterized SQL; c.set('auth') → extra.authInfo wiring matches @hono/mcp@0.1.5.

Observability / attribution — where identity flows

Five sinks: mcp-users Blobs (email/domain/requestCount), MCPcat (userId=sub, userName=email, domain), Kapa (_meta.user={email, company_name: domain}), optional CRM webhook (email/domain/sub), and function logs (domain/sub only, no raw email). A short "what we collect and where it goes" summary in the description would help the privacy review. Two notes vs the "capture … org" framing: names are never captured (despite profile scope, exchangeCode never reads name), and org_id/org_name are dropped before every sink — everything attributes on the email domain, not the Auth0 org (confirmed in Kapa: company shows redpanda.com).

Open questions

Attribution fields: can you confirm user-email/user-domain get captured once the form schema updates on production deploy? (Finding #1 — currently dropped on the preview.)
Consent before enforcement: will the consent screen land before REQUIRE_AUTH=true / public launch?
Org vs domain (confirmed, flagging only): attribution is domain-based — Kapa shows company redpanda.com, and the stores/CRM/form use domain, never the Auth0 org_name/org_id (captured into the token but dropped before every sink). If domain is the intended key, no action; flagging in case org-level was expected.
Prod backend: will prod ship with STORE_BACKEND=neon from cutover (reuse detection depends on it)?
Account-requirement scope (product/rollout): once enforced, the MCP gates all docs — Self-Managed, Connect, ADP — behind a Redpanda Cloud account, no per-product exemption. Intended end state (especially for the open-source Connect community), or keep the grace period open for anonymous reads with sign-in required only for higher limits / the feedback tool?

Test evidence (screenshots below): the api-feedback Netlify submission (only page-path + feedback captured), and the Kapa Users view showing the authenticated conversation attributed to email + company.

micheleRP · 2026-06-22T19:30:33Z

micheleRP

very nice! Please see Claude's comments, but looks great!

JakeSCahill force-pushed the feature/mcp-email-auth branch from 9c6f267 to 3a89e42 Compare June 15, 2026 10:56

chore: bump server.json version to 2026.06.15+pr181-8d11b9b (PR #181)

14e57d0

JakeSCahill changed the title ~~Add email-capture auth to the docs MCP server~~ DOC-2262: Add email-capture auth to the docs MCP server Jun 15, 2026

github-actions Bot and others added 5 commits June 15, 2026 11:06

chore: bump server.json version to 2026.06.15+pr181-ff0868b (PR #181)

64dff55

chore: bump server.json version to 2026.06.15+pr181-75928f7 (PR #181)

e71d339

chore: bump server.json version to 2026.06.15+pr181-852c563 (PR #181)

a669b2c

JakeSCahill changed the title ~~DOC-2262: Add email-capture auth to the docs MCP server~~ DOC-2262: Add OAuth authentication to the docs MCP server (Redpanda Cloud IdP) Jun 15, 2026

JakeSCahill and others added 3 commits June 15, 2026 13:28

chore: trigger preview redeploy to pick up REQUIRE_AUTH

3e0d7ca

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chore: trigger preview redeploy to pick up REQUIRE_AUTH

dc655b1

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chore: bump server.json version to 2026.06.15+pr181-a59f207 (PR #181)

e751010

JakeSCahill commented Jun 15, 2026

View reviewed changes

Comment thread data-platform/modules/ROOT/pages/how-to-use-these-docs.adoc Outdated

JakeSCahill and others added 8 commits June 15, 2026 14:20

Apply suggestion from @JakeSCahill

c0b9280

chore: bump server.json version to 2026.06.15+pr181-27a3294 (PR #181)

70623c0

Remove unused exports from MCP auth modules

e8aac4c

Drop the unused OAUTH_ISSUER export from idp.mjs and de-export the FREE_EMAIL_DOMAINS / DISPOSABLE_DOMAINS sets (used only internally). No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chore: bump server.json version to 2026.06.15+pr181-20381aa (PR #181)

a5cf8b9

TEMP: add CIMD probe client metadata document (to be removed)

59be719

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chore: bump server.json version to 2026.06.15+pr181-00193c2 (PR #181)

b3927cd

Remove temporary CIMD probe client doc

b2def9e

The probe confirmed CIMD is not enabled on the Cloud IdP (a valid client metadata document used as client_id still returns 'Unknown client'). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chore: bump server.json version to 2026.06.15+pr181-1fe3d6c (PR #181)

f3212e0

JakeSCahill and others added 3 commits June 16, 2026 17:18

JakeSCahill and others added 2 commits June 18, 2026 20:26

chore: bump server.json version to 2026.06.18+pr181-4b6920d (PR #181)

82cac4f

JakeSCahill and others added 4 commits June 22, 2026 10:55

chore: bump server.json version to 2026.06.22+pr181-ac52b6c (PR #181)

7becb83

Surface feedback capability in server card and server.json

7bb60d2

Mention the feedback tool in the MCP server-card and server.json descriptions, and bump the server-card version to 1.3.0 to match.

chore: bump server.json version to 2026.06.22+pr181-748cb39 (PR #181)

95d50fd

This was referenced Jun 22, 2026

Move OAuth state to Neon Postgres (atomic one-time-use) behind STORE_BACKEND flag #184

Merged

Fix serverless crashes (socket hang up) and 60s idle invocations in MCP function #185

Merged

JakeSCahill and others added 6 commits June 22, 2026 12:17

chore: bump server.json version to 2026.06.22+pr181-30a95eb (PR #181)

8115fe4

chore: bump server.json version to 2026.06.22+pr181-7df642a (PR #181)

0eed4e9

chore: bump server.json version to 2026.06.22+pr181-d76fb89 (PR #181)

f9b49fb

JakeSCahill commented Jun 22, 2026

View reviewed changes

Comment thread home/modules/ROOT/pages/how-to-use-these-docs.adoc Outdated

JakeSCahill commented Jun 22, 2026

View reviewed changes

Comment thread home/modules/ROOT/pages/how-to-use-these-docs.adoc Outdated

JakeSCahill and others added 10 commits June 22, 2026 12:44

Apply suggestions from code review

174ec6e

Co-authored-by: Jake Cahill <45230295+JakeSCahill@users.noreply.github.com>

chore: bump server.json version to 2026.06.22+pr181-15a4023 (PR #181)

00c2108

chore: bump server.json version to 2026.06.22+pr181-279799e (PR #181)

4cafeb1

chore: bump server.json version to 2026.06.22+pr181-fb8ddf9 (PR #181)

5f59c9d

Add customContextDescription to MCPcat tool-call context

5c07ee8

Steer the agent-intent prompt: ask for a concise third-person summary of what the user is trying to accomplish, and explicitly exclude credentials, tokens, personal data, and verbatim secrets from that free-text field.

chore: bump server.json version to 2026.06.22+pr181-b375857 (PR #181)

9836c7d

Merge remote-tracking branch 'origin/main' into feature/mcp-email-auth

7e47e01

# Conflicts: # netlify/functions/mcp.mjs

micheleRP reviewed Jun 22, 2026

View reviewed changes

micheleRP approved these changes Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DOC-2262: Add OAuth authentication to the docs MCP server (Redpanda Cloud IdP)#181

DOC-2262: Add OAuth authentication to the docs MCP server (Redpanda Cloud IdP)#181
JakeSCahill wants to merge 63 commits into
mainfrom
feature/mcp-email-auth

JakeSCahill commented Jun 15, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

JakeSCahill commented Jun 15, 2026

Uh oh!

Uh oh!

JakeSCahill commented Jun 16, 2026

Uh oh!

JakeSCahill commented Jun 16, 2026

Uh oh!

JakeSCahill commented Jun 20, 2026

Uh oh!

Uh oh!

Uh oh!

micheleRP left a comment

Uh oh!

micheleRP commented Jun 22, 2026

Uh oh!

micheleRP left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

JakeSCahill commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

Architecture (decided with Cloud Identity)

Also in this PR: OAuth state on Neon Postgres

Also in this PR: documentation feedback tool

Login page

Future (phase 2)

Testing

History

Uh oh!

netlify Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for redpanda-documentation ready!

Uh oh!

JakeSCahill commented Jun 15, 2026

⛔ Blocked: identity team — Dynamic Client Registration is disabled on the Cloud IdP

What works (verified against prd auth.prd.cloud.redpanda.com)

The blocker

Needed from the identity team (DOC-2262)

Status

Uh oh!

Uh oh!

JakeSCahill commented Jun 16, 2026

Update — call with Cloud Identity (Santi): implementation direction decided

Uh oh!

JakeSCahill commented Jun 16, 2026

Milestone 1 landed — production AS scaffold (jose + storage), federating to Auth0

Uh oh!

JakeSCahill commented Jun 20, 2026

Uh oh!

Uh oh!

Uh oh!

micheleRP left a comment

Choose a reason for hiding this comment

Review — security + functional, with live preview testing

Tested on the deploy preview

⚠️ Finding #1 — attribution fields aren't captured by the api-feedback form yet (caught in testing)

Other findings (non-blocking)

Verified in code

Observability / attribution — where identity flows

Open questions

Uh oh!

micheleRP commented Jun 22, 2026

Uh oh!

micheleRP left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JakeSCahill commented Jun 15, 2026 •

edited

Loading

netlify Bot commented Jun 15, 2026 •

edited

Loading

What works (verified against prd `auth.prd.cloud.redpanda.com`)

⚠️ Finding #1 — attribution fields aren't captured by the `api-feedback` form yet (caught in testing)