Skip to content

uffd: record warmup faults and prefetch them on later forks#218

Draft
sjmiller609 wants to merge 9 commits into
hypeship/uffd-page-serverfrom
hypeship/uffd-prefetch-hotpages
Draft

uffd: record warmup faults and prefetch them on later forks#218
sjmiller609 wants to merge 9 commits into
hypeship/uffd-page-serverfrom
hypeship/uffd-prefetch-hotpages

Conversation

@sjmiller609
Copy link
Copy Markdown
Collaborator

Stacked on: #216 (uffd page server) — review #213#214#216 first.

Summary

  • Adds HotPage / HotPageList types with sort+dedup snapshot, atomic Save, and LoadHotPageList (binary varint format with a HPL1 magic).
  • New Config.RecordHotPages flag turns on per-fault recording in the page-fault loop.
  • New Server.Prefetch(forkID, list) issues UFFDIO_COPY for every entry in a hot-page list against the fork's userfaultfd before the guest unpauses.
  • The prefetcher is installed by the platform listener once the uffd has been received and registered; EEXIST/EAGAIN are tolerated to absorb first-touch races with vCPUs.

Why

Even with the shared mem-file + UFFD page server, a fresh fork still pays a fault round-trip on every page the guest needs to boot — that's tens of thousands of page-fault round-trips on the critical path. Recording the hot set during a template's first warmup fork and prefetching it on every later fork eliminates those round-trips entirely. Template.HotPagesPath (reserved in PR 2) finally has a producer/consumer.

Test plan

  • go test ./lib/uffd/... (covers HotPageList sort/dedup/save/load + bad-magic + truncation)
  • Manual: warm a template with RecordHotPages: true, save the list, fork without prefetch and time boot; fork with prefetch and time boot; confirm the second is faster
  • Manual: prefetch a corrupted/wrong-region list — confirm clean error rather than UB

🤖 Generated with Claude Code

sjmiller609 and others added 7 commits May 13, 2026 00:43
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a Firecracker fork descends from a Template source, skip copying the
snapshot mem-file and hardlink it to the source's instead. Firecracker
mmaps the mem-file MAP_PRIVATE on restore, so all forks COW from the same
backing inode — no per-fork copy required.

Hardlink rather than symlink: firecracker's restore path temporarily
aliases the source data dir to the fork data dir while loading the
snapshot (withSnapshotSourceDirAlias). A symlink whose target traverses
the source dir would resolve back into the fork dir during that window
and trip ELOOP; a hardlink resolves by inode so the alias has no effect
on it. Hardlinks require both paths on the same filesystem, which holds
for our standard data-dir layout.

Gated to Firecracker only because other hypervisors (cloud-hypervisor,
qemu, vz) don't share MAP_PRIVATE semantics on their snapshot layouts.
Restricted to Template sources because they are explicitly promoted as
fork-only and can never be restored — sharing the mem-file with a
non-Template source would let a later RestoreInstance mutate the file
out from under live forks.

Stacked on hypeship/template-as-state so the Template state both gates
"this snapshot is safe to fan out from" and lets fork counts be derived
at read time.
Adds lib/uffd, a userfaultfd page server that backs many concurrent
fan-out forks against a single read-only template mem-file instead of
letting each fork mmap it privately. Firecracker connects to a per-fork
UDS, hands us its userfaultfd via SCM_RIGHTS along with a JSON
mappings handshake, and the server then services UFFD_EVENT_PAGEFAULT
events with UFFDIO_COPY reads from the template.

The Linux hot path lives behind a build tag; non-Linux builds return
ErrUnsupported so callers can fall back to MAP_PRIVATE. Cross-platform
tests cover the handshake parser and the server lifecycle.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a hot-page recorder + prefetch primitive on top of the userfaultfd
page server. During a template's first warmup fork the server can
record every served page (Config.RecordHotPages); the resulting
HotPageList is stable-sorted, deduplicated, and saved to disk in a
small binary format alongside the template. Later forks call
Server.Prefetch(forkID, list) to issue UFFDIO_COPY for every recorded
page against their userfaultfd before the guest unpauses, eliminating
the fault round-trips on those addresses.

The prefetcher is installed by the platform-specific listener once the
fork's uffd has been received and registered, so callers can race
Prefetch and the fault loop safely. EEXIST/EAGAIN are tolerated the
same way the fault handler does to absorb first-touch races with
vCPUs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds StateTemplate to the instance state machine. A Standby instance is
auto-promoted to Template the first time it's forked from a snapshot,
and ForkCount is bumped on each subsequent fork. Templates can't wake
while ForkCount > 0; un-promote (Template -> Standby) and delete
(Template -> Stopped) are both refused until forks drain.

Fork bookkeeping lives on StoredMetadata (IsTemplate, ForkCount,
ForkOfTemplate, plus a reserved HotPagesPath for the prefetch path).
Deleting a fork decrements the parent template's ForkCount under the
parent's lock; deletion of the fork's own data has already happened, so
worst case is refcount drift that a future reconciliation pass fixes.

The running-fork flow keeps skipping promotion: it restores the source
back to Running afterward, and a template can't wake.
Drops the persisted ForkCount field from StoredMetadata and the
decrement bookkeeping in DeleteInstance. Live forks of a template are
now counted by scanning metadata for ForkOfTemplate matches via a new
countTemplateForks helper. The fork-of-template field itself remains
the single source of truth, so there's no drift to reconcile.

Template promotion on fork only flips IsTemplate when not already set;
deletion of a template still refuses when forks exist, but the count
is computed from disk rather than read from a denormalized field.
Previously ForkInstance auto-promoted a Standby source to Template the
first time it was forked from a snapshot, and RestoreInstance auto-demoted
a Template before waking it. That implicit lifecycle blurred the rules: a
Standby and a "Standby that has been forked once" behaved differently,
and callers had to know that restoring a Template was a two-step
operation under the hood.

Replace it with explicit PromoteToTemplate / DemoteTemplate manager
methods (and matching POST /instances/{id}/promote-template and
/demote-template endpoints). Promotion is now Standby -> Template only;
demotion is Template -> Standby only and refuses while live forks
reference the template. ForkInstance only records the parent linkage if
the source is already a Template, and RestoreInstance no longer
auto-demotes — callers must demote first.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sjmiller609 sjmiller609 force-pushed the hypeship/uffd-page-server branch from 27082d3 to 005508c Compare May 13, 2026 18:14
@sjmiller609 sjmiller609 force-pushed the hypeship/uffd-prefetch-hotpages branch from 21bdfb5 to 579a72e Compare May 13, 2026 18:15
sjmiller609 and others added 2 commits May 13, 2026 15:30
Silently continuing past an unreadable metadata file could undercount
forks of a template, allowing DemoteTemplate or DeleteInstance to free
a template whose pages are still mapped by a live fork.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sjmiller609 sjmiller609 force-pushed the hypeship/uffd-page-server branch from 005508c to e5594f9 Compare May 13, 2026 20:16
@sjmiller609 sjmiller609 force-pushed the hypeship/uffd-prefetch-hotpages branch from 579a72e to df782fa Compare May 13, 2026 20:20
@sjmiller609 sjmiller609 force-pushed the hypeship/uffd-page-server branch from e5594f9 to 17831da Compare May 13, 2026 20:40
@sjmiller609 sjmiller609 force-pushed the hypeship/uffd-prefetch-hotpages branch from df782fa to f141f3c Compare May 13, 2026 20:40
@sjmiller609 sjmiller609 force-pushed the hypeship/uffd-page-server branch from 17831da to 12fcda0 Compare May 15, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant