Skip to content

Drive pull-reprint with a single reprint pull#3768

Open
epeicher wants to merge 12 commits into
trunkfrom
cli/pull-reprint-delta-resync
Open

Drive pull-reprint with a single reprint pull#3768
epeicher wants to merge 12 commits into
trunkfrom
cli/pull-reprint-delta-resync

Conversation

@epeicher

@epeicher epeicher commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Related issues

How AI was used in this PR

Claude Code measured the single-fork memory profile against a live WP.com Atomic site (26k files / ~570 MB), implemented the migration and tests, and verified the full flow end-to-end (fresh pull plus delta re-pull: remote edit -> re-pull -> local content / auto-login checks). Each change was reviewed and re-tested live.

Proposed Changes

Re-running studio pull-reprint on a completed pull previously printed "pulled successfully" and exited without contacting the remote, so remote edits, inserts, and deletions never reached the local site. With this PR a re-run performs a delta re-pull: files re-sync incrementally and the database is refreshed so existing-row edits and deletions propagate.

Studio now drives reprint with a single reprint pull instead of orchestrating the pipeline itself. One call runs preflight -> files-pull -> db-pull -> db-apply -> flat-docroot -> apply-runtime in one PHP-WASM fork, and reprint resets its own state for a delta re-pull via prepare_repull(). This deletes the former Studio orchestration: the five per-sub-command functions, the nine-stage machine, and the --abort delta-reset wiring (about 340 lines net). The single fork's memory_limit is raised to 1024M because the file-index high-water-mark now persists across phases (measured peak ~510M on the test site).

Fixes that remain part of making re-pulls real:

  • Auto-login keeps working after a re-pull. The database refresh rebuilds the DB from the remote dump, wiping the local admin user and studio_admin_username option that /studio-auto-login depends on. The pull re-applies the stored credentials over the running site's admin API, and a connection failure on that request doubles as a health check that catches a stale daemon "online" state and falls back to starting the server.
  • Database URLs can't drift to a dead port. If the Studio site record was re-created with a different id/port, the re-pull adopts the live site record before the pull so the URL rewrite (--new-site-url) targets the right port.
  • Site starts no longer crash on stale runtime mounts. Transient reprint state files mounted for the remote-uploads proxy are dropped when loading the persisted start-options.json (previously ENOENT crashed the start).

Bundle the reprint build via REPRINT_VERSION in scripts/download-wp-server-files.ts; it must include WordPress/reprint#268.

Testing Instructions

  1. Build the reprint phar from a branch that includes pull: fix the flattened runtime root and delta re-pull after deferred files WordPress/reprint#268 (and Release 1.0.4 #264): run composer build:phar on the reprint repo.
  2. On this repo, place reprint.phar at wp-files/reprint/reprint.phar and run npm run cli:build.
  3. Pull a site: node apps/cli/dist/cli/main.mjs pull-reprint --verbose --url <site> and let it complete. It should invoke a single reprint pull (visible with --verbose) and start a working site.
  4. Edit an existing post on the remote site, then re-run the same command. It should report "Updating ... (delta sync)" and the local post should show the edit.
  5. Open the local site's WP Admin via the printed auto-login URL and confirm the edit from step 4.
  6. Stop and start the site (studio site stop / site start). No ENOENT mount errors.

Pre-merge Checklist

  • Have you checked for TypeScript, React or other console errors?

Re-running `studio pull-reprint` on a completed pull previously
printed "pulled successfully" and exited without contacting the
remote, so remote edits never reached the local site. A re-run now
resets the stage machine and re-executes every phase; reprint makes
each phase incremental (files delta sync, full DB re-download +
re-apply with the idempotent dump).

Details:

- Persist a hasCompletedOnce flag in pull.json so the non-empty
  site-directory guard is skipped on re-pulls (the directory holds
  the previous pull's output) and drop the cached preflight so
  connectivity and the secret-rotation retry path are re-verified.
- Re-apply the stored admin credentials over the running site's
  admin API after the database refresh: db-apply rebuilds the DB
  from the remote dump, wiping the local admin user and the
  studio_admin_username option that /studio-auto-login depends on.
  The server only re-applies credentials at startup, and a re-pull
  deliberately skips restarting a running server. A connection
  failure on that request doubles as a health check — if the
  daemon's "online" view is stale, fall through to a server start.
- Reconcile pull.json with the live site record in ensurePort()
  before the DB stages: if the site record was re-created with a
  different id/port, db-apply would rewrite the database URLs to a
  port nothing serves.
- Drop /tmp/reprint/* mounts with missing host paths when loading
  the persisted start-options.json: reprint state files mounted for
  the remote-uploads proxy are transient, and mounting a missing
  path crashed server starts with ENOENT.
- Fix shouldRestartFilesSyncIndex() comparing against the legacy
  'files-sync' command name; reprint canonicalizes to 'files-pull'
  when saving state, so the check was dead code.

Requires the reprint-side refresh support
(WordPress/reprint#257) in the bundled reprint.phar.
@epeicher epeicher force-pushed the cli/pull-reprint-delta-resync branch from 05d8026 to 19388cd Compare June 11, 2026 15:22
@epeicher epeicher marked this pull request as ready for review June 15, 2026 09:37
@epeicher epeicher requested review from a team and fredrikekelund June 15, 2026 09:55
@wpmobilebot

wpmobilebot commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

📊 Performance Test Results

Comparing dc16064 vs trunk

app-size

Metric trunk dc16064 Diff Change
App Size (Mac) 2357.83 MB 2357.79 MB 0.04 MB ⚪ 0.0%

site-editor

Metric trunk dc16064 Diff Change
load 1077 ms 1726 ms +649 ms 🔴 60.3%

site-startup

Metric trunk dc16064 Diff Change
siteCreation 6493 ms 8532 ms +2039 ms 🔴 31.4%
siteStartup 6965 ms 3882 ms 3083 ms 🟢 -44.3%

Results are median values from multiple test runs.

Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff)

epeicher added 2 commits June 18, 2026 11:47
A delta re-pull re-runs files-sync/db-sync/db-apply against a state dir
where the previous pull left them "complete". Re-running as-is either
throws (files-sync rejects a --filter change while a sync still looks in
progress) or silently no-ops, so the database never refreshes. Issue
`<command> --abort` before each on a re-pull to reset the sub-command
state so every phase re-runs: files-sync keeps the local index and deltas,
db-sync re-downloads, db-apply re-applies.

Gated on hasCompletedOnce (the persisted "this pull finished once" flag)
rather than the local isRepull, because the first re-pull attempt already
reset the stage to "initialized"; a retry must still abort. Each command
is aborted only while its stage is still pending, so a mid-pull resume
never wipes a phase that already re-ran. Runs after preflight, which
files-sync/db-sync --abort require.

This removes the dependency on reprint's sub-commands auto-refreshing, so
the re-pull works against a stock reprint build.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the studio pull-reprint orchestration so re-running a previously completed pull performs a real delta re-pull (re-entering reprint phases, clearing “complete” subcommand state via --abort, and re-running DB refresh), rather than exiting early without syncing remote changes.

Changes:

  • Reset completed pull sessions back to initialized on re-run, clear cached preflight, and abort completed reprint subcommands so files re-sync incrementally and DB refresh re-applies.
  • Improve resilience around reprint state/version differences and stale runtime mounts (preventing ENOENT crashes on server start).
  • Add Vitest coverage for delta re-pull reset behavior, exporter re-enable on preflight retry, and admin-credentials re-apply behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
apps/cli/lib/wordpress-server-manager.ts Filters out stale /tmp/reprint/* mounts from persisted start options to avoid ENOENT crashes.
apps/cli/lib/pull/reprint-state.ts Accepts both files-sync and canonicalized files-pull when interpreting reprint state.
apps/cli/commands/pull-reprint.ts Implements delta re-pull flow (stage reset, preflight refresh, --abort state clearing), updates port adoption logic, and re-applies admin credentials when reusing a running server.
apps/cli/commands/tests/pull-reprint.test.ts Adds tests covering delta re-pull behavior and recovery paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apps/cli/commands/pull-reprint.ts
On a re-pull where the daemon process is still up, the site-started
"already running" branch skipped updateSiteLatestCliPid and
updateSiteAutoStart; both only ran on the start branch. The running-status
check requires site.latestCliPid to equal the live process pid, so a stale
or missing pid (e.g. after ensurePort adopts a re-created site record) made
the site report as stopped. Mirror the start branch and `studio site
start`'s already-running path: refresh latestCliPid when online and keep
autoStart enabled.
const isRepull = studioMetadata.stage === 'completed';
if ( isRepull ) {
studioMetadata.stage = 'initialized';
studioMetadata.hasCompletedOnce = true;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful if reprint exposed this information? Such as php reprint.php import-metadata | jq '.hasCompletedOnce'? It's already stored there so that would be less state and less opportunity for one of these values to go out of sync.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would be better as reprint "knows better" about the pull, and reducing the state to the minimum seems a good option.


// Re-verify connectivity (and give the secret-rotation retry path
// a chance to run) instead of trusting the cached preflight from
// the original pull, which may be days old.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a cached preflight come up in your testing? In what way did it cause trouble?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had errors for sites that were pulled some time before that were "missing multipart boundary" due to HTML being returned. The way to reproduce that is to pull a site, then wait for 60 minutes (or reduce the sliding window), and then pull again

// Rotating the secret does not bump `reprint_exporter_enabled`, so
// re-open the gate explicitly; otherwise the retry hits the same
// closed window and ?reprint-api falls through to an HTML page.
await enableReprintExporter(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@adamziel

Copy link
Copy Markdown
Contributor

The code change mostly makes sense, thank you @epeicher! I'm only confused by one thing: If we're moving to reliance on pull, shouldn't we be able to remove all the manual tool calls, e.g. applyDownloadedDatabase()? pull covers all that ground in a way we don't need to manage in Studio.

@epeicher

Copy link
Copy Markdown
Contributor Author

The code change mostly makes sense, thank you @epeicher! I'm only confused by one thing: If we're moving to reliance on pull, shouldn't we be able to remove all the manual tool calls, e.g. applyDownloadedDatabase()? pull covers all that ground in a way we don't need to manage in Studio.

As part of this PR, I have not moved to relying on pull, but to relying on <sub-command> --abort to clear the completed state first. I could iterate to rely completely on pull (I know we discussed that, but I kept the changes minimal), let me try that and update the PR.

Replace the per-sub-command orchestration (downloadEssentialSiteFiles,
refreshFlattenedSiteDirectory, downloadRemoteDatabase,
applyDownloadedDatabase, generateRuntimeConfiguration) and the
clearCompletedSubcommandState/--abort delta-reset with one `reprint pull`
call. reprint owns the pipeline ordering (files-pull -> db-pull ->
db-apply -> flat-docroot -> apply-runtime) and resets its own state for a
delta re-pull via prepare_repull(), so the Studio-side --abort wiring and
per-phase stage gating go away.

- runFullPull() issues the single pull with the same sqlite geometry the
  old db-apply used (target sqlite under the raw content dir) plus
  --flatten-to, --runtime=playground-cli, --start-runtime=none and
  --output-dir, mounting the site + runtime dirs up front.
- ensurePort moves before the pull so --new-site-url is available.
- Collapse the stage machine from 9 stages to 5 (initialized -> pulled ->
  site-registered -> site-started -> completed).
- Bump the PHP-WASM memory_limit to 1024M: the single long-lived fork
  holds the file-index high-water-mark across phases.

Requires reprint's flatten_to->flat_document_root bridge and the re-pull
filter-guard fix (WordPress/reprint#268).
@epeicher epeicher changed the title Make pull-reprint re-runs perform a delta re-pull Drive pull-reprint with a single reprint pull Jun 18, 2026
Drop the historic 'replaces the former clearCompletedSubcommandState'
framing from the pull comment and the runFullPull docblock; describe what
the single-pull flow does now.
@epeicher

Copy link
Copy Markdown
Contributor Author

The latest changes 0b27bdf, use a composite pull command instead of using multiple sub-commands as suggested. After some testing, this is working fine and the code is greatly reduced 🙌

The changes require WordPress/reprint#268 to be applied and generate a reprint.phar from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants