This file teaches an AI (or developer) the full concept, architecture, and step-by-step execution for any stack. The Cloudflare Pages middleware is the constant. The backend and database are swappable.
Problem: SPAs return empty HTML to bots. Bots can't run JavaScript, so they see nothing.
Solution: Intercept bot requests at the CDN edge (Cloudflare), serve them pre-built HTML from a cache. Humans get the normal SPA. The cache is rebuilt on a schedule.
Three components:
- Bot Detector (Cloudflare Pages Middleware) — always the same
- Cache Store (any database) — stores pre-built HTML keyed by URL path
- Cache Builder (any backend) — generates HTML and upserts into cache
Browser Request
│
▼
┌─────────────────────────────────┐
│ Cloudflare Pages Middleware │
│ (functions/_middleware.ts) │
│ │
│ 1. Static asset? → pass thru │
│ 2. Sitemap URL? → proxy to API │
│ 3. Fetch scripts → inject tags │
│ 4. Bot UA? → prerendered HTML │
│ 5. Human? → SPA + injected │
└─────────────────────────────────┘
│ │
▼ ▼
SPA (Vite) Your Backend API
├── GET /prerender?path=... (cache lookup)
├── POST /generate-cache (cache builder)
├── GET /sitemap?name=... (sitemap)
├── GET /script-service (3rd party scripts)
└── POST /manage-cron (scheduler)
The middleware is the only Cloudflare-specific part. Your backend API can be:
- Supabase Edge Functions (Deno)
- Node.js / Express
- Python / FastAPI / Django
- Go / Gin
- AWS Lambda / API Gateway
- Vercel Serverless Functions
- Netlify Functions
- Firebase Cloud Functions
- Any HTTP server
Your backend must expose these endpoints. The middleware calls them.
Purpose: Return cached HTML for a given URL path.
Request: GET /prerender?path=/markets/bitcoin-price
Response (cache hit):
{ "html": "<!DOCTYPE html><html>...</html>", "title": "Bitcoin Price", "cached": true }Response (cache miss):
{ "error": "not_found" }Status: 404
Logic:
- Look up
pathin your cache store - If found, increment
hit_count, return HTML - If not found, return 404
- Optionally: check
expires_atand returnX-Cache: staleheader if expired
Purpose: Build HTML for all pages and store in cache.
Request: POST /generate-cache with optional { "force": true }
Logic:
- Query your database for all content that needs pages (products, articles, markets, etc.)
- For each item, generate a full HTML document with:
<title>and<meta description>- Open Graph tags (
og:title,og:description,og:image) - Twitter Card tags
- JSON-LD structured data
- Visible body content (headings, paragraphs, lists)
- Internal links
- Upsert each page into your cache store keyed by
path - Return a summary:
{ "pages_cached": 150, "errors": 0 }
Purpose: Return XML sitemaps for search engines.
Purpose: Schedule/unschedule automated cache rebuilds.
Your cache needs to store these fields per page. Adapt to your database:
CREATE TABLE prerendered_pages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(), -- or AUTO_INCREMENT for MySQL
path TEXT UNIQUE NOT NULL, -- URL path, e.g. '/markets/bitcoin'
html TEXT NOT NULL, -- Full HTML document
title TEXT, -- Page title (for debugging)
description TEXT, -- Meta description
og_image TEXT, -- OG image URL
source_table TEXT, -- Which table generated this page
source_id TEXT, -- ID of the source record
content_type TEXT, -- 'market', 'article', 'product', etc.
hit_count INTEGER DEFAULT 0, -- How many times bots requested this
expires_at TIMESTAMPTZ, -- When cache becomes stale
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_prerendered_pages_path ON prerendered_pages(path);// Collection: prerendered_pages
{
path: "/markets/bitcoin", // Unique index on this field
html: "<!DOCTYPE html>...",
title: "Bitcoin Price Market",
description: "...",
og_image: "https://...",
source_collection: "markets",
source_id: "abc123",
content_type: "market",
hit_count: 0,
expires_at: ISODate("2026-03-01T00:00:00Z"),
created_at: ISODate("2026-02-24T00:00:00Z"),
updated_at: ISODate("2026-02-24T00:00:00Z")
}
// Create unique index:
db.prerendered_pages.createIndex({ path: 1 }, { unique: true })Key: prerender:/markets/bitcoin
Value: { full HTML string }
TTL: 3600 (1 hour auto-expiry)
// Collection: prerendered_pages
// Document ID: path encoded (e.g., "markets--bitcoin")
{
path: "/markets/bitcoin",
html: "<!DOCTYPE html>...",
title: "Bitcoin Price Market",
hit_count: 0,
expires_at: Timestamp,
updated_at: Timestamp
}CREATE TABLE static_sitemaps (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
filename TEXT UNIQUE NOT NULL,
content TEXT NOT NULL,
url_count INTEGER DEFAULT 0,
generated_at TIMESTAMPTZ DEFAULT now(),
expires_at TIMESTAMPTZ
);This is the same for every stack. It lives at functions/_middleware.ts in your project root.
What it does:
- Checks if the request is for a static asset → pass through
- Checks if the URL matches a sitemap pattern → proxy to your sitemap API
- Checks the User-Agent against 100+ known bot strings → fetch prerendered HTML from your API
- Otherwise → pass through to SPA
Key configuration points:
BACKEND_API_URL— your API base URL (set as Cloudflare env var)BACKEND_API_KEY— auth key for your API (set as Cloudflare env var)BOT_AGENTS— array of bot user-agent substrings
See middleware.ts for the full implementation. The only things to change:
- The URL it fetches prerendered HTML from (line that calls your
/prerenderendpoint) - The auth header format (Bearer token, API key header, etc.)
Follow these phases in order. Each depends on the previous one.
Decision: Pick your backend and database.
| If you use... | Cache Store | Backend API | Scheduler |
|---|---|---|---|
| Supabase | PostgreSQL (built-in) | Supabase Edge Functions (Deno) | pg_cron |
| MongoDB + Node.js | MongoDB collection | Express/Fastify API | node-cron or external |
| Firebase | Firestore collection | Firebase Cloud Functions | Firebase scheduled functions |
| AWS | DynamoDB or RDS | Lambda + API Gateway | EventBridge |
| Vercel | Any DB (Postgres, Mongo) | Vercel Serverless Functions | Vercel Cron |
| Railway / Render | PostgreSQL (managed) | Node.js service | Built-in cron |
| Raw VPS | PostgreSQL / MySQL | Express / FastAPI | system cron |
The Cloudflare Pages middleware stays the same regardless of choice.
Goal: Set up storage for pre-built HTML pages.
Steps:
- Create the
prerendered_pagestable/collection in your database (see schemas above) - Create a unique index on
path - Optionally create
static_sitemapstable/collection - Verify you can insert and query by path
Success criteria: You can insert { path: "/test", html: "<h1>Test</h1>" } and retrieve it by path.
Goal: Create GET /prerender?path=... that returns cached HTML.
Pseudocode (works in any language):
function handlePrerender(request):
path = request.query.path
if not path: return 400 "Missing path"
page = database.findOne({ path: path })
if not page: return 404 "Not found"
// Optional: increment hit counter
database.update({ path: path }, { $inc: { hit_count: 1 } })
// Optional: check staleness
is_stale = page.expires_at < now()
return HTML response:
status: 200
headers:
Content-Type: text/html
X-Prerender: true
X-Cache: is_stale ? "stale" : "hit"
body: page.html
Node.js/Express example:
app.get('/prerender', async (req, res) => {
const { path } = req.query;
if (!path) return res.status(400).json({ error: 'Missing path' });
const page = await db.collection('prerendered_pages').findOne({ path });
if (!page) return res.status(404).json({ error: 'not_found' });
await db.collection('prerendered_pages').updateOne({ path }, { $inc: { hit_count: 1 } });
res.set('Content-Type', 'text/html');
res.set('X-Prerender', 'true');
res.send(page.html);
});Supabase Edge Function: See edge-functions/prerender.ts for the Deno/Supabase version.
Success criteria: curl "https://your-api.com/prerender?path=/test" returns HTML.
Goal: Create POST /generate-cache that builds HTML for all pages and stores them.
This is the most project-specific part. You need to:
- Define your static pages — home, about, pricing, FAQ, etc.
- Query your dynamic content — products, articles, markets, users, etc.
- Generate full HTML for each — with SEO tags, JSON-LD, body content
HTML template structure (universal):
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>${title}</title>
<meta name="description" content="${description}">
<link rel="canonical" href="${siteUrl}${path}">
<!-- Open Graph -->
<meta property="og:title" content="${title}">
<meta property="og:description" content="${description}">
<meta property="og:image" content="${ogImage}">
<meta property="og:url" content="${siteUrl}${path}">
<meta property="og:type" content="website">
<!-- Twitter -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="${title}">
<meta name="twitter:description" content="${description}">
<meta name="twitter:image" content="${ogImage}">
<!-- JSON-LD (customize per page type) -->
<script type="application/ld+json">${jsonLd}</script>
</head>
<body>
<h1>${heading}</h1>
<p>${bodyContent}</p>
<!-- Add real content: lists, tables, links — anything you want indexed -->
</body>
</html>Pseudocode:
function generateCache():
pages = []
// Static pages
pages.push({ path: "/", title: "Home", html: buildHomePage() })
pages.push({ path: "/pricing", title: "Pricing", html: buildPricingPage() })
// Dynamic pages from database
products = database.query("SELECT * FROM products WHERE active = true")
for product in products:
pages.push({
path: "/products/" + product.slug,
title: product.name,
html: buildProductPage(product)
})
// Upsert all pages into cache
for page in pages:
database.upsert("prerendered_pages", { path: page.path }, {
path: page.path,
html: page.html,
title: page.title,
updated_at: now(),
expires_at: now() + 1 hour
})
return { pages_cached: pages.length }
Supabase reference: See edge-functions/generate-prerender-cache.ts
Success criteria: After running, prerendered_pages has rows for every route with non-empty HTML.
Goal: Add bot detection at the Cloudflare edge.
Steps:
- Create
functions/_middleware.tsat your project root (NOT insidesrc/) - Create
functions/tsconfig.jsonfor Cloudflare Workers types - Configure environment variables in Cloudflare Pages dashboard:
| Variable | Value | Description |
|---|---|---|
BACKEND_API_URL |
https://your-api.com |
Your prerender API base URL |
BACKEND_API_KEY |
your-secret-key |
Auth for your API (if needed) |
For Supabase specifically:
SUPABASE_URL=https://YOUR_PROJECT.supabase.coSUPABASE_ANON_KEY= your anon key
For Node.js / Express:
BACKEND_API_URL=https://your-server.com/apiBACKEND_API_KEY= whatever auth your API expects
For Firebase:
BACKEND_API_URL=https://us-central1-YOUR_PROJECT.cloudfunctions.net
- In
middleware.ts, update the fetch URL to match your API:
// Supabase:
const prerenderUrl = `${env.SUPABASE_URL}/functions/v1/prerender?path=${encodeURIComponent(pathname)}`;
// Node.js:
const prerenderUrl = `${env.BACKEND_API_URL}/prerender?path=${encodeURIComponent(pathname)}`;
// Firebase:
const prerenderUrl = `${env.BACKEND_API_URL}/prerender?path=${encodeURIComponent(pathname)}`;- Important file structure:
project-root/
├── functions/
│ ├── _middleware.ts ← Bot detection (Cloudflare)
│ └── tsconfig.json ← Workers types
├── src/ ← Your React app (unchanged)
└── ...
- Deploy: push to GitHub (Cloudflare auto-deploys) or
wrangler pages deploy
Success criteria: functions/ directory at project root, env vars set in Cloudflare dashboard.
Goal: Populate cache and verify end-to-end.
Steps:
- Trigger the cache generator:
# Supabase:
curl -X POST "https://YOUR_PROJECT.supabase.co/functions/v1/generate-prerender-cache" \
-H "Authorization: Bearer YOUR_SERVICE_ROLE_KEY" -H "Content-Type: application/json" \
-d '{"force": true}'
# Node.js:
curl -X POST "https://your-server.com/api/generate-cache" \
-H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: application/json" \
-d '{"force": true}'
# Firebase:
curl -X POST "https://us-central1-PROJECT.cloudfunctions.net/generateCache" \
-H "Content-Type: application/json" -d '{"force": true}'- Verify cache has data:
-- PostgreSQL / Supabase:
SELECT path, title, length(html) as html_size FROM prerendered_pages LIMIT 10;// MongoDB:
db.prerendered_pages.find({}, { path: 1, title: 1 }).limit(10)- Test bot detection:
# Human request — should return normal SPA
curl -s https://your-site.pages.dev/ | head -5
# Bot request — should return prerendered HTML
curl -s -H "User-Agent: Googlebot" https://your-site.pages.dev/ | head -50
# Debug endpoint
curl https://your-site.pages.dev/__debug
# Bot debug
curl -H "User-Agent: Googlebot" https://your-site.pages.dev/__debugSuccess criteria: Bots get rich HTML. Humans get SPA.
Goal: Schedule automatic rebuilds so content stays fresh.
Choose your scheduler:
pg_cron (Supabase / PostgreSQL):
curl -X POST "https://YOUR_PROJECT.supabase.co/functions/v1/manage-cron-job" \
-H "Authorization: Bearer YOUR_ANON_KEY" -H "Content-Type: application/json" \
-d '{
"action": "schedule",
"jobName": "prerender-cache-refresh",
"targetFunction": "generate-prerender-cache",
"schedule": "0 * * * *",
"body": {"force": true}
}'node-cron (Node.js):
const cron = require('node-cron');
cron.schedule('0 * * * *', async () => {
await generateCache({ force: true });
console.log('Cache refreshed');
});Firebase Scheduled Functions:
exports.refreshCache = functions.pubsub.schedule('every 1 hours').onRun(async () => {
await generateCache({ force: true });
});GitHub Actions (any backend):
# .github/workflows/refresh-cache.yml
on:
schedule:
- cron: '0 * * * *'
jobs:
refresh:
runs-on: ubuntu-latest
steps:
- run: curl -X POST "${{ secrets.CACHE_API_URL }}/generate-cache" -H "Authorization: Bearer ${{ secrets.API_KEY }}"Vercel Cron:
// vercel.json
{ "crons": [{ "path": "/api/generate-cache", "schedule": "0 * * * *" }] }Success criteria: Cache rebuilds automatically. updated_at timestamps refresh on schedule.
If using Supabase with pg_cron, you also need:
CREATE TABLE cron_job_runs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
job_name TEXT NOT NULL,
status TEXT DEFAULT 'running',
started_at TIMESTAMPTZ DEFAULT now(),
completed_at TIMESTAMPTZ,
error_message TEXT,
markets_synced INTEGER DEFAULT 0,
markets_failed INTEGER DEFAULT 0,
details JSONB,
created_at TIMESTAMPTZ DEFAULT now()
);
-- PostgreSQL functions for cron management
CREATE OR REPLACE FUNCTION schedule_cron_job(...) ...
CREATE OR REPLACE FUNCTION unschedule_cron_job(...) ...
CREATE OR REPLACE FUNCTION get_cron_job_status(...) ...See database-schema.sql for full SQL and edge-functions/manage-cron-job.ts for the Supabase implementation.
React SPAs cannot natively inject 3rd-party scripts without modifying source code. This is a critical limitation because:
- AI-built apps (Lovable, Bolt, Cursor) don't expose
index.html— you can't manually add<script>tags - Every AI rebuild can overwrite hardcoded scripts
- Analytics, tracking pixels, consent managers all require
<script>tags in<head>or<body> - No server layer exists in a pure SPA to dynamically inject tags
The Script Service is a backend function that returns a JSON object containing all scripts to inject:
{
"head": [
"<script async src=\"https://www.googletagmanager.com/gtag/js?id=G-XXXXX\"></script>",
"<script>/* consent manager */</script>"
],
"body": [
"<script>/* chat widget */</script>"
]
}The Cloudflare middleware fetches this registry (cached for 5 minutes) and injects the scripts into every HTML response — both for bots and humans. Scripts run in a first-party context (same domain), bypassing ad-blocker and reverse-proxy limitations.
Purpose: Return all 3rd-party scripts to inject into HTML responses.
Response:
{
"head": ["<script>...</script>", "<script src=\"...\"></script>"],
"body": ["<script>...</script>"]
}Pseudocode (any language):
function handleScriptService(request):
return JSON response:
status: 200
headers:
Content-Type: application/json
Cache-Control: public, max-age=300
body: {
head: [
// Google Analytics
"<script async src='https://www.googletagmanager.com/gtag/js?id=G-XXXXX'></script>",
// Consent manager
"<script src='https://consent-tool.com/banner.js'></script>"
],
body: [
// Chat widget, etc.
]
}
Node.js/Express example:
app.get('/script-service', (req, res) => {
res.json({
head: [
`<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXX"></script>
<script>window.dataLayer=window.dataLayer||[];function gtag(){dataLayer.push(arguments)}gtag('js',new Date());gtag('config','G-XXXXX');</script>`,
],
body: [],
});
});Supabase Edge Function: See edge-functions/script-service.ts for the Deno/Supabase version.
The middleware already handles script injection. The key functions:
// Fetch scripts (cached in-memory for 5 minutes)
async function getScripts(env): Promise<{ head: string; body: string }> {
// Fetch from your script-service endpoint
// Cache the result in-memory for 5 min
// Return joined strings for head and body
}
// Inject into HTML response
function injectScripts(html: string, scripts: { head: string; body: string }): string {
if (scripts.head) html = html.replace('</head>', `${scripts.head}\n</head>`);
if (scripts.body) html = html.replace('</body>', `${scripts.body}\n</body>`);
return html;
}To add a new script: edit your script-service function and redeploy. No changes to your React app.
To remove a script: delete it from the registry and redeploy. No changes to your React app.
The 5-minute edge cache ensures changes propagate quickly without hammering your backend.
- Add a content generator function in your cache builder
- Query your database for the new content type
- Build HTML with appropriate SEO tags and JSON-LD schema
- Upsert into your cache store
Edit the BOT_AGENTS array in functions/_middleware.ts. The included list covers 100+ bots including Googlebot, Bingbot, social crawlers, AI crawlers, and SEO tools.
- Set
expires_atwhen inserting cache entries - The prerender endpoint can serve stale content with
X-Cache: staleheader - The scheduler rebuilds cache periodically to keep it fresh
- Include real text content — not just metadata. Google indexes body content.
- Add internal
<a href>links so crawlers discover more pages - Use semantic HTML:
<h1>,<h2>,<p>,<ul>,<table> - Include JSON-LD for rich results: FAQ, Product, Article, Organization
| Issue | Solution |
|---|---|
| Bot gets SPA instead of HTML | Check __debug endpoint — verify isBot: true for your UA |
| Cache miss (404 from prerender) | Run your cache generator to populate |
| Stale content | Check expires_at, re-run cache generator |
| Middleware not running | Ensure functions/ is at project root, not inside src/ |
| Sitemaps 404 | Ensure your sitemap endpoint is deployed and middleware proxies correctly |
| Scheduler not firing | Check your scheduler logs (pg_cron, node-cron, etc.) |
| API auth failing | Verify Cloudflare env vars match your API auth requirements |
| HTML too small / empty | Check your content generator queries — ensure they return data |
| Scripts not injecting | Check __debug endpoint for scriptCacheAge, verify script-service endpoint returns valid JSON |
| Scripts blocked by ad-blockers | Scripts injected at edge run in first-party context — this is the solution |