RFC: Partial Prerendering (PPR) — static shells from edge + streaming dynamic holes

## Summary

Add **Partial Prerendering (PPR)** to merjs — serve pre-rendered static shells from Cloudflare KV edge (~1ms) while streaming dynamic content via WASM resolves. Inspired by Next.js 14 PPR, but adapted to merjs's Zig/WASM architecture.

This RFC debates the approach, identifies problems, and proposes a novel solution that plays to merjs's strengths.

---

## Historical Context — jQuery to PPR

The placeholder/resolve pattern has been reinvented every ~5 years since the 90s:

| Era | Technology | How it works |
|-----|-----------|-------------|
| 1993 | **SSI** | `` — server blocks, assembles fragments |
| 2001 | **ESI** (Akamai) | `<esi:include src="/api/nav" />` — same, but at CDN edge with per-fragment TTLs |
| 2006 | **jQuery `.load()`** | `$('#cart').load('/partials/cart')` — client pulls HTML into placeholder divs |
| 2010 | **Facebook BigPipe** | Server pushes `<script>BigPipe.onPageletArrive({id:"cart",content:"..."})</script>` via chunked HTML |
| 2014 | **Marko `<await>`** | Declarative out-of-order streaming with `client-reorder` |
| 2022 | **React 18 Suspense** | `<Suspense fallback={<Skeleton/>}>` — same BigPipe pattern, React-managed |
| 2023 | **Next.js PPR** | Static shell from CDN + streamed Suspense holes |
| **now** | **merjs** | ? |

**merjs already implements the BigPipe pattern exactly** — look at `StreamWriter.resolve()` in `mer.zig:42-53`:

```zig
pub fn resolve(self, id, content) {
    self.write("<div hidden id=\"S:");
    self.write(id);
    self.write("\">");
    self.write(content);
    self.write("</div><script>");
    self.write("(function(){var p=document.getElementById('P:");
    self.write(id);
    self.write("'),s=document.getElementById('S:");
    self.write(id);
    self.write("');if(p&&s){p.outerHTML=s.innerHTML;s.remove()}}())");
    self.write("</script>");
}
```

This **is** `BigPipe.onPageletArrive()`, minus the jQuery. The question is: how do we split this across time — shell at build/deploy, resolves at request?

---

## What merjs Already Has

Three rendering modes that form a natural progression:

| Mode | How it works | Where |
|------|-------------|-------|
| **SSG** (`prerender = true`) | Build-time render → `dist/*.html` | `src/prerender.zig` |
| **Shell-first streaming** | Layout head flushed, body blocks, tail follows | `src/server.zig:321-349` |
| **True streaming** (`renderStream`) | Placeholder/resolve — skeletons swap to real content | `src/server.zig:275-318` |

PPR would sit between SSG (fully static) and streaming (fully dynamic): **static shell + dynamic holes**.

---

## The Debate: What's Good

### 1. The core insight is real
Most pages are 80% static, 20% dynamic. KV edge read is ~1ms, WASM render is ~5-15ms, external fetch is ~100-500ms. Serving the 80% from edge while overlapping the 20% computation is genuinely faster.

### 2. merjs is uniquely positioned
Unlike Next.js which retrofitted PPR onto React's hydration model, merjs has **no hydration**. The resolve scripts are self-contained inline `<script>` tags — no JS bundle hash dependencies, no React runtime version coupling. The static shell is truly static.

### 3. The `placeholder()`/`resolve()` split already exists
The boundary between "shell" and "dynamic" is already expressed in user code. No compiler analysis needed.

### 4. Cloudflare KV is a natural fit
KV is designed for read-heavy, write-on-deploy workloads — exactly the access pattern for pre-rendered shells. Two-tier cache (Cache API L1 per-colo + KV L2 global) gives per-colo speed with global durability.

---

## The Debate: What's Bad

### 1. Shell extraction is harder than it looks

`renderStream` runs linearly — writes shell, then placeholders, then fetches, then resolves. To extract just the shell at build time, you need to run `renderStream` and stop before resolves. But how does the framework know where the shell ends?

- **Convention-based** ("everything before first `resolve()`"): fragile if someone writes shell HTML after a resolve
- **Two-pass render**: runs `renderStream` twice, page code might have side effects
- **Separate functions** (`renderShell()` + `renderDynamic()`): doubles the API surface

Next.js avoids this by using React's component tree — static components are identified by the absence of `headers()`/`cookies()` calls. merjs doesn't have a component tree — it's imperative `stream.write()` calls. **The boundary is implicit in control flow, not explicit in structure.**

### 2. Cache invalidation is the actual hard problem

PPR shifts complexity from rendering to caching:
- Version-aware cache keys (deploy v3 shouldn't serve v2 shells)
- KV propagation takes up to **60 seconds** globally
- Stale-while-revalidate during propagation windows
- Per-route TTLs (blog changes hourly, about page changes quarterly)

The current `prerender.zig` is beautifully simple: render → write file → done. KV cache management adds operational surface area disproportionate to the perf gain for most sites.

### 3. The WASM/Worker architecture fights you

The current WASM API returns a **single buffer** (`handle()` → pointer to complete response). There's no streaming from WASM to JS. To stream resolves progressively, you'd need:
- A streaming WASM API (chunks written to shared memory + signals to JS) — significant complexity
- Or buffer all resolves in WASM, append to shell in JS — loses progressive streaming benefit

### 4. Is it actually faster for merjs?

Current WASM renders full pages in **~5-15ms**. Adding KV lookup (~1ms hit, ~50ms cold miss) + shell parsing + stitching might not beat "just render the whole thing in WASM." PPR shines when SSR is slow (React: 50-200ms). **merjs is already fast enough that the delta might be 5-7ms.**

### 5. It breaks the simplicity contract

merjs's pitch: write Zig, get a web app, zero npm, zero complexity. PPR adds:
- New `pub const ppr = true` flag
- Authors must reason about static-vs-dynamic splits
- KV namespace config in wrangler.toml
- Cache invalidation on deploy
- Debug complexity ("why stale content?" → KV propagation delay)

---

## Alternative Approaches Considered

### Option A: Transparent PPR (auto-detect)
Run every `renderStream` page in collect mode at build time. Capture `write()` + `placeholder()` output as the shell. Everything after first `resolve()` is dynamic.

**Pro**: Zero API changes. **Con**: Relies on convention that writes precede resolves.

### Option B: ISR-style (cache at request time)
Skip build-time extraction. First request renders full page, extracts shell, caches in KV. Subsequent requests serve cached shell + run resolves.

**Pro**: No build changes, works with dynamic routes. **Con**: First visitor gets no benefit.

### Option C: HTMLRewriter composition
Store full static page (with skeleton placeholders) in KV. At request time, stream through Cloudflare's `HTMLRewriter` — intercept `<div id="P:...">` and replace with WASM output.

```js
return new HTMLRewriter()
  .on('div[id^="P:"]', new DynamicSlotHandler(wasm))
  .transform(new Response(cachedShell));
```

**Pro**: True edge streaming, no WASM streaming API needed. **Con**: Parsing HTML you just generated (+1-2ms).

### Option D: Full-page caching with TTL
Pages declare `pub const cache_ttl = 300;`. Worker caches full rendered response in KV. No PPR, just HTTP caching done right.

**Pro**: 20 lines of JS, zero Zig changes. **Con**: Doesn't help mixed static/dynamic pages.

---

## Proposed Solution: "Incremental Edge Composition" (IEC)

None of the above are quite right for merjs. Here's a novel approach that plays to merjs's actual strengths:

### The Key Insight

merjs's WASM render is already **so fast** (~8ms) that the bottleneck is never rendering — it's **external data fetches** (100-500ms). The existing two-phase worker protocol (`collect_fetch_urls` → parallel fetch → `handle`) already solves this for full renders. PPR should optimize the same bottleneck, not the render.

**IEC = cache the render output per-placeholder, not per-page.**

### How It Works

```
REQUEST FLOW:

1. Worker receives request for /product/42
2. Worker calls WASM collect_fetch_urls() — gets ["api.com/product/42", "api.com/reviews/42"]
3. For each URL, check KV cache:
   - KV hit  → use cached response (skip fetch)
   - KV miss → fetch from origin, store in KV with TTL
4. Provide results to WASM (existing provide_fetch_result API)
5. WASM renders full page (~8ms) — this is fast, don't optimize it
6. Cache the FULL rendered page in Cache API (L1, per-colo, short TTL)
7. Return response
```

### What's Different From PPR

| | Next.js PPR | merjs IEC |
|--|-------------|-----------|
| **What's cached** | The static shell HTML | The data fetches + full rendered pages |
| **What's dynamic** | Suspense boundary content | Nothing — re-render is cheap |
| **Cache granularity** | Per-page shell | Per-fetch-URL data + per-page output |
| **Invalidation unit** | Deploy (new shell) | Per-URL TTL (data freshness) |
| **Cold request** | Shell instant + stream dynamic | Full render (~8ms) + cache populate |
| **Warm request** | Shell instant + stream dynamic | Full page from Cache API (~1ms) |

### Why This Is Better For merjs

1. **No shell extraction problem.** You don't need to split `renderStream` into static/dynamic parts. The entire page renders in WASM — it's fast enough.

2. **Data-level caching is more reusable.** If `/product/42` and `/product/42/reviews` both fetch `api.com/product/42`, the data cache is shared. Shell caching can't do this.

3. **No new page API.** Zero changes to how authors write pages. No `ppr = true`, no thinking about static-vs-dynamic boundaries.

4. **Graceful degradation.** Cache miss = full render in ~8ms + fetches. That's already fast. Cache hit = ~1ms. There's no "stale shell + fresh dynamic = visual mismatch" problem.

5. **Works with dynamic routes.** `/users/:id` pages get cached per-ID automatically. No build-time enumeration needed.

### Page Author API (unchanged!)

```zig
// app/product.zig — no changes needed
pub const meta: mer.Meta = .{ .title = "Product" };

pub fn renderStream(req: mer.Request, stream: *mer.StreamWriter) void {
    stream.write("<h1>Product</h1>");
    stream.placeholder("details", "<div class='skeleton'>...</div>");
    stream.placeholder("reviews", "<div class='skeleton'>...</div>");

    const results = mer.fetchAll(req.allocator, &.{
        .{ .url = "https://api.example.com/product/42" },
        .{ .url = "https://api.example.com/reviews/42" },
    });

    stream.resolve("details", formatProduct(results[0]));
    stream.resolve("reviews", formatReviews(results[1]));
}
```

### Worker Changes (the only changes needed)

```js
// worker.js — add fetch-level caching + page-level caching

export default {
  async fetch(request, env, ctx) {
    const url = new URL(request.url);
    const cache = caches.default;

    // L1: Check full-page cache (per-colo, ~0ms)
    const pageKey = new Request(url.href, { method: "GET" });
    const cached = await cache.match(pageKey);
    if (cached) return cached;

    // L2: Render with data-level caching
    const wasm = await getInstance();
    const input = `${request.method} ${url.pathname}`;
    // ... existing collect_fetch_urls() call ...

    // Fetch with per-URL KV caching
    await Promise.all(urls.map(async (fetchUrl) => {
      // Check KV for cached data
      const dataKey = `data:${fetchUrl}`;
      const cachedData = await env.DATA_CACHE.get(dataKey);
      
      let body;
      if (cachedData) {
        body = cachedData; // KV hit — skip network
      } else {
        const res = await fetch(fetchUrl);
        body = await res.text();
        // Cache in KV with TTL (don't block response)
        ctx.waitUntil(
          env.DATA_CACHE.put(dataKey, body, { expirationTtl: 300 })
        );
      }
      // Provide to WASM (existing API)
      provideToWasm(wasm, fetchUrl, body);
    }));

    // Render full page in WASM (~8ms)
    const response = renderInWasm(wasm, input);

    // Cache full page in Cache API (don't block)
    if (response.status === 200) {
      const cacheResponse = new Response(response.body, {
        headers: { ...response.headers, "Cache-Control": "s-maxage=60" },
      });
      ctx.waitUntil(cache.put(pageKey, cacheResponse.clone()));
    }

    return response;
  }
};
```

### wrangler.toml addition

```toml
[[kv_namespaces]]
binding = "DATA_CACHE"
id = "..."
```

### Cache Invalidation

- **Data cache (KV):** TTL-based. Product data expires in 5 min, weather in 1 min. No deploy-time purge needed.
- **Page cache (Cache API):** Short TTL (30-60s). Per-colo, auto-evicts. Pages automatically refresh as data cache updates.
- **Deploy purge:** Optional — `wrangler kv:bulk delete` to clear data cache. Page cache expires naturally via TTL.

### Performance Characteristics

| Scenario | Latency | What happens |
|----------|---------|-------------|
| **Full cache hit** | ~1ms | Page from Cache API (per-colo) |
| **Page miss, data hit** | ~10ms | WASM render + data from KV |
| **Page miss, data miss** | ~100-500ms | WASM render + live fetch + cache populate |
| **First request ever** | ~100-500ms | Same as above, but populates both caches |

Compare to PPR:
| **PPR shell hit** | ~1ms + streaming | Shell from KV + WASM resolve stream |
| **PPR shell miss** | ~50ms + render | KV cold read + full fallback render |

IEC's warm path (~1ms) matches PPR's warm path (~1ms shell + streaming overhead). IEC's cold path is simpler and has no "stale shell + fresh data" visual mismatch risk.

---

## Implementation Plan

If we go with IEC, the changes are minimal:

1. **worker.js** — Add data-level KV caching around the existing fetch loop + page-level Cache API caching (~50 lines)
2. **wrangler.toml** — Add `DATA_CACHE` KV namespace binding
3. **Optional: `pub const cache_ttl`** — Let pages declare data freshness hints that the worker reads from the WASM route metadata

No changes to: `mer.zig`, `prerender.zig`, `dispatch.zig`, `router.zig`, `codegen.zig`, `build.zig`.

---

## Open Questions

1. Should we support per-fetch-URL TTLs? e.g. `mer.fetch(alloc, .{ .url = "...", .cache_ttl = 60 })`
2. Should the page-level Cache API TTL be configurable per-route, or global?
3. Do we want a `cache_ttl = 0` escape hatch for truly dynamic pages (e.g. reading cookies)?
4. Should we add cache status headers (`X-Cache: HIT/MISS`) for debugging?
5. Is there a future where we want true PPR (shell + streaming) on top of IEC for pages with very slow external APIs?

---

## TL;DR

**Don't cache the shell. Cache the data.** merjs's WASM render is fast enough (~8ms) that the bottleneck is external fetches, not rendering. Cache fetch results in KV (data-level) + cache full pages in Cache API (page-level). Zero Zig changes, ~50 lines of worker.js, same performance as PPR on warm requests, simpler mental model, no static/dynamic boundary problem.

Era	Technology	How it works
1993	SSI	`<!--#include virtual="/header.html" -->` — server blocks, assembles fragments
2001	ESI (Akamai)	`<esi:include src="/api/nav" />` — same, but at CDN edge with per-fragment TTLs
2006	jQuery `.load()`	`$('#cart').load('/partials/cart')` — client pulls HTML into placeholder divs
2010	Facebook BigPipe	Server pushes `<script>BigPipe.onPageletArrive({id:"cart",content:"..."})</script>` via chunked HTML
2014	Marko `<await>`	Declarative out-of-order streaming with `client-reorder`
2022	React 18 Suspense	`<Suspense fallback={<Skeleton/>}>` — same BigPipe pattern, React-managed
2023	Next.js PPR	Static shell from CDN + streamed Suspense holes
now	merjs	?

Mode	How it works	Where
SSG (`prerender = true`)	Build-time render → `dist/*.html`	`src/prerender.zig`
Shell-first streaming	Layout head flushed, body blocks, tail follows	`src/server.zig:321-349`
True streaming (`renderStream`)	Placeholder/resolve — skeletons swap to real content	`src/server.zig:275-318`

	Next.js PPR	merjs IEC
What's cached	The static shell HTML	The data fetches + full rendered pages
What's dynamic	Suspense boundary content	Nothing — re-render is cheap
Cache granularity	Per-page shell	Per-fetch-URL data + per-page output
Invalidation unit	Deploy (new shell)	Per-URL TTL (data freshness)
Cold request	Shell instant + stream dynamic	Full render (~8ms) + cache populate
Warm request	Shell instant + stream dynamic	Full page from Cache API (~1ms)

Scenario	Latency	What happens
Full cache hit	~1ms	Page from Cache API (per-colo)
Page miss, data hit	~10ms	WASM render + data from KV
Page miss, data miss	~100-500ms	WASM render + live fetch + cache populate
First request ever	~100-500ms	Same as above, but populates both caches

RFC: Partial Prerendering (PPR) — static shells from edge + streaming dynamic holes #65

Description

Summary

Historical Context — jQuery to PPR

What merjs Already Has

The Debate: What's Good

1. The core insight is real

2. merjs is uniquely positioned

3. The placeholder()/resolve() split already exists

4. Cloudflare KV is a natural fit

The Debate: What's Bad

1. Shell extraction is harder than it looks

2. Cache invalidation is the actual hard problem

3. The WASM/Worker architecture fights you

4. Is it actually faster for merjs?

5. It breaks the simplicity contract

Alternative Approaches Considered

Option A: Transparent PPR (auto-detect)

Option B: ISR-style (cache at request time)

Option C: HTMLRewriter composition

Option D: Full-page caching with TTL

Proposed Solution: "Incremental Edge Composition" (IEC)

The Key Insight

How It Works

What's Different From PPR

Why This Is Better For merjs

Page Author API (unchanged!)

Worker Changes (the only changes needed)

wrangler.toml addition

Cache Invalidation

Performance Characteristics

Implementation Plan

Open Questions

TL;DR

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

3. The `placeholder()`/`resolve()` split already exists