Skip to content

RFC: Partial Prerendering (PPR) — static shells from edge + streaming dynamic holes #65

@justrach

Description

@justrach

Summary

Add Partial Prerendering (PPR) to merjs — serve pre-rendered static shells from Cloudflare KV edge (~1ms) while streaming dynamic content via WASM resolves. Inspired by Next.js 14 PPR, but adapted to merjs's Zig/WASM architecture.

This RFC debates the approach, identifies problems, and proposes a novel solution that plays to merjs's strengths.


Historical Context — jQuery to PPR

The placeholder/resolve pattern has been reinvented every ~5 years since the 90s:

Era Technology How it works
1993 SSI <!--#include virtual="/header.html" --> — server blocks, assembles fragments
2001 ESI (Akamai) <esi:include src="/api/nav" /> — same, but at CDN edge with per-fragment TTLs
2006 jQuery .load() $('#cart').load('/partials/cart') — client pulls HTML into placeholder divs
2010 Facebook BigPipe Server pushes <script>BigPipe.onPageletArrive({id:"cart",content:"..."})</script> via chunked HTML
2014 Marko <await> Declarative out-of-order streaming with client-reorder
2022 React 18 Suspense <Suspense fallback={<Skeleton/>}> — same BigPipe pattern, React-managed
2023 Next.js PPR Static shell from CDN + streamed Suspense holes
now merjs ?

merjs already implements the BigPipe pattern exactly — look at StreamWriter.resolve() in mer.zig:42-53:

pub fn resolve(self, id, content) {
    self.write("<div hidden id=\"S:");
    self.write(id);
    self.write("\">");
    self.write(content);
    self.write("</div><script>");
    self.write("(function(){var p=document.getElementById('P:");
    self.write(id);
    self.write("'),s=document.getElementById('S:");
    self.write(id);
    self.write("');if(p&&s){p.outerHTML=s.innerHTML;s.remove()}}())");
    self.write("</script>");
}

This is BigPipe.onPageletArrive(), minus the jQuery. The question is: how do we split this across time — shell at build/deploy, resolves at request?


What merjs Already Has

Three rendering modes that form a natural progression:

Mode How it works Where
SSG (prerender = true) Build-time render → dist/*.html src/prerender.zig
Shell-first streaming Layout head flushed, body blocks, tail follows src/server.zig:321-349
True streaming (renderStream) Placeholder/resolve — skeletons swap to real content src/server.zig:275-318

PPR would sit between SSG (fully static) and streaming (fully dynamic): static shell + dynamic holes.


The Debate: What's Good

1. The core insight is real

Most pages are 80% static, 20% dynamic. KV edge read is ~1ms, WASM render is ~5-15ms, external fetch is ~100-500ms. Serving the 80% from edge while overlapping the 20% computation is genuinely faster.

2. merjs is uniquely positioned

Unlike Next.js which retrofitted PPR onto React's hydration model, merjs has no hydration. The resolve scripts are self-contained inline <script> tags — no JS bundle hash dependencies, no React runtime version coupling. The static shell is truly static.

3. The placeholder()/resolve() split already exists

The boundary between "shell" and "dynamic" is already expressed in user code. No compiler analysis needed.

4. Cloudflare KV is a natural fit

KV is designed for read-heavy, write-on-deploy workloads — exactly the access pattern for pre-rendered shells. Two-tier cache (Cache API L1 per-colo + KV L2 global) gives per-colo speed with global durability.


The Debate: What's Bad

1. Shell extraction is harder than it looks

renderStream runs linearly — writes shell, then placeholders, then fetches, then resolves. To extract just the shell at build time, you need to run renderStream and stop before resolves. But how does the framework know where the shell ends?

  • Convention-based ("everything before first resolve()"): fragile if someone writes shell HTML after a resolve
  • Two-pass render: runs renderStream twice, page code might have side effects
  • Separate functions (renderShell() + renderDynamic()): doubles the API surface

Next.js avoids this by using React's component tree — static components are identified by the absence of headers()/cookies() calls. merjs doesn't have a component tree — it's imperative stream.write() calls. The boundary is implicit in control flow, not explicit in structure.

2. Cache invalidation is the actual hard problem

PPR shifts complexity from rendering to caching:

  • Version-aware cache keys (deploy v3 shouldn't serve v2 shells)
  • KV propagation takes up to 60 seconds globally
  • Stale-while-revalidate during propagation windows
  • Per-route TTLs (blog changes hourly, about page changes quarterly)

The current prerender.zig is beautifully simple: render → write file → done. KV cache management adds operational surface area disproportionate to the perf gain for most sites.

3. The WASM/Worker architecture fights you

The current WASM API returns a single buffer (handle() → pointer to complete response). There's no streaming from WASM to JS. To stream resolves progressively, you'd need:

  • A streaming WASM API (chunks written to shared memory + signals to JS) — significant complexity
  • Or buffer all resolves in WASM, append to shell in JS — loses progressive streaming benefit

4. Is it actually faster for merjs?

Current WASM renders full pages in ~5-15ms. Adding KV lookup (~1ms hit, ~50ms cold miss) + shell parsing + stitching might not beat "just render the whole thing in WASM." PPR shines when SSR is slow (React: 50-200ms). merjs is already fast enough that the delta might be 5-7ms.

5. It breaks the simplicity contract

merjs's pitch: write Zig, get a web app, zero npm, zero complexity. PPR adds:

  • New pub const ppr = true flag
  • Authors must reason about static-vs-dynamic splits
  • KV namespace config in wrangler.toml
  • Cache invalidation on deploy
  • Debug complexity ("why stale content?" → KV propagation delay)

Alternative Approaches Considered

Option A: Transparent PPR (auto-detect)

Run every renderStream page in collect mode at build time. Capture write() + placeholder() output as the shell. Everything after first resolve() is dynamic.

Pro: Zero API changes. Con: Relies on convention that writes precede resolves.

Option B: ISR-style (cache at request time)

Skip build-time extraction. First request renders full page, extracts shell, caches in KV. Subsequent requests serve cached shell + run resolves.

Pro: No build changes, works with dynamic routes. Con: First visitor gets no benefit.

Option C: HTMLRewriter composition

Store full static page (with skeleton placeholders) in KV. At request time, stream through Cloudflare's HTMLRewriter — intercept <div id="P:..."> and replace with WASM output.

return new HTMLRewriter()
  .on('div[id^="P:"]', new DynamicSlotHandler(wasm))
  .transform(new Response(cachedShell));

Pro: True edge streaming, no WASM streaming API needed. Con: Parsing HTML you just generated (+1-2ms).

Option D: Full-page caching with TTL

Pages declare pub const cache_ttl = 300;. Worker caches full rendered response in KV. No PPR, just HTTP caching done right.

Pro: 20 lines of JS, zero Zig changes. Con: Doesn't help mixed static/dynamic pages.


Proposed Solution: "Incremental Edge Composition" (IEC)

None of the above are quite right for merjs. Here's a novel approach that plays to merjs's actual strengths:

The Key Insight

merjs's WASM render is already so fast (~8ms) that the bottleneck is never rendering — it's external data fetches (100-500ms). The existing two-phase worker protocol (collect_fetch_urls → parallel fetch → handle) already solves this for full renders. PPR should optimize the same bottleneck, not the render.

IEC = cache the render output per-placeholder, not per-page.

How It Works

REQUEST FLOW:

1. Worker receives request for /product/42
2. Worker calls WASM collect_fetch_urls() — gets ["api.com/product/42", "api.com/reviews/42"]
3. For each URL, check KV cache:
   - KV hit  → use cached response (skip fetch)
   - KV miss → fetch from origin, store in KV with TTL
4. Provide results to WASM (existing provide_fetch_result API)
5. WASM renders full page (~8ms) — this is fast, don't optimize it
6. Cache the FULL rendered page in Cache API (L1, per-colo, short TTL)
7. Return response

What's Different From PPR

Next.js PPR merjs IEC
What's cached The static shell HTML The data fetches + full rendered pages
What's dynamic Suspense boundary content Nothing — re-render is cheap
Cache granularity Per-page shell Per-fetch-URL data + per-page output
Invalidation unit Deploy (new shell) Per-URL TTL (data freshness)
Cold request Shell instant + stream dynamic Full render (~8ms) + cache populate
Warm request Shell instant + stream dynamic Full page from Cache API (~1ms)

Why This Is Better For merjs

  1. No shell extraction problem. You don't need to split renderStream into static/dynamic parts. The entire page renders in WASM — it's fast enough.

  2. Data-level caching is more reusable. If /product/42 and /product/42/reviews both fetch api.com/product/42, the data cache is shared. Shell caching can't do this.

  3. No new page API. Zero changes to how authors write pages. No ppr = true, no thinking about static-vs-dynamic boundaries.

  4. Graceful degradation. Cache miss = full render in ~8ms + fetches. That's already fast. Cache hit = ~1ms. There's no "stale shell + fresh dynamic = visual mismatch" problem.

  5. Works with dynamic routes. /users/:id pages get cached per-ID automatically. No build-time enumeration needed.

Page Author API (unchanged!)

// app/product.zig — no changes needed
pub const meta: mer.Meta = .{ .title = "Product" };

pub fn renderStream(req: mer.Request, stream: *mer.StreamWriter) void {
    stream.write("<h1>Product</h1>");
    stream.placeholder("details", "<div class='skeleton'>...</div>");
    stream.placeholder("reviews", "<div class='skeleton'>...</div>");

    const results = mer.fetchAll(req.allocator, &.{
        .{ .url = "https://api.example.com/product/42" },
        .{ .url = "https://api.example.com/reviews/42" },
    });

    stream.resolve("details", formatProduct(results[0]));
    stream.resolve("reviews", formatReviews(results[1]));
}

Worker Changes (the only changes needed)

// worker.js — add fetch-level caching + page-level caching

export default {
  async fetch(request, env, ctx) {
    const url = new URL(request.url);
    const cache = caches.default;

    // L1: Check full-page cache (per-colo, ~0ms)
    const pageKey = new Request(url.href, { method: "GET" });
    const cached = await cache.match(pageKey);
    if (cached) return cached;

    // L2: Render with data-level caching
    const wasm = await getInstance();
    const input = `${request.method} ${url.pathname}`;
    // ... existing collect_fetch_urls() call ...

    // Fetch with per-URL KV caching
    await Promise.all(urls.map(async (fetchUrl) => {
      // Check KV for cached data
      const dataKey = `data:${fetchUrl}`;
      const cachedData = await env.DATA_CACHE.get(dataKey);
      
      let body;
      if (cachedData) {
        body = cachedData; // KV hit — skip network
      } else {
        const res = await fetch(fetchUrl);
        body = await res.text();
        // Cache in KV with TTL (don't block response)
        ctx.waitUntil(
          env.DATA_CACHE.put(dataKey, body, { expirationTtl: 300 })
        );
      }
      // Provide to WASM (existing API)
      provideToWasm(wasm, fetchUrl, body);
    }));

    // Render full page in WASM (~8ms)
    const response = renderInWasm(wasm, input);

    // Cache full page in Cache API (don't block)
    if (response.status === 200) {
      const cacheResponse = new Response(response.body, {
        headers: { ...response.headers, "Cache-Control": "s-maxage=60" },
      });
      ctx.waitUntil(cache.put(pageKey, cacheResponse.clone()));
    }

    return response;
  }
};

wrangler.toml addition

[[kv_namespaces]]
binding = "DATA_CACHE"
id = "..."

Cache Invalidation

  • Data cache (KV): TTL-based. Product data expires in 5 min, weather in 1 min. No deploy-time purge needed.
  • Page cache (Cache API): Short TTL (30-60s). Per-colo, auto-evicts. Pages automatically refresh as data cache updates.
  • Deploy purge: Optional — wrangler kv:bulk delete to clear data cache. Page cache expires naturally via TTL.

Performance Characteristics

Scenario Latency What happens
Full cache hit ~1ms Page from Cache API (per-colo)
Page miss, data hit ~10ms WASM render + data from KV
Page miss, data miss ~100-500ms WASM render + live fetch + cache populate
First request ever ~100-500ms Same as above, but populates both caches

Compare to PPR:
| PPR shell hit | ~1ms + streaming | Shell from KV + WASM resolve stream |
| PPR shell miss | ~50ms + render | KV cold read + full fallback render |

IEC's warm path (~1ms) matches PPR's warm path (~1ms shell + streaming overhead). IEC's cold path is simpler and has no "stale shell + fresh data" visual mismatch risk.


Implementation Plan

If we go with IEC, the changes are minimal:

  1. worker.js — Add data-level KV caching around the existing fetch loop + page-level Cache API caching (~50 lines)
  2. wrangler.toml — Add DATA_CACHE KV namespace binding
  3. Optional: pub const cache_ttl — Let pages declare data freshness hints that the worker reads from the WASM route metadata

No changes to: mer.zig, prerender.zig, dispatch.zig, router.zig, codegen.zig, build.zig.


Open Questions

  1. Should we support per-fetch-URL TTLs? e.g. mer.fetch(alloc, .{ .url = "...", .cache_ttl = 60 })
  2. Should the page-level Cache API TTL be configurable per-route, or global?
  3. Do we want a cache_ttl = 0 escape hatch for truly dynamic pages (e.g. reading cookies)?
  4. Should we add cache status headers (X-Cache: HIT/MISS) for debugging?
  5. Is there a future where we want true PPR (shell + streaming) on top of IEC for pages with very slow external APIs?

TL;DR

Don't cache the shell. Cache the data. merjs's WASM render is fast enough (~8ms) that the bottleneck is external fetches, not rendering. Cache fetch results in KV (data-level) + cache full pages in Cache API (page-level). Zero Zig changes, ~50 lines of worker.js, same performance as PPR on warm requests, simpler mental model, no static/dynamic boundary problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions