Drive pull-reprint with a single reprint pull#3768
Conversation
Re-running `studio pull-reprint` on a completed pull previously printed "pulled successfully" and exited without contacting the remote, so remote edits never reached the local site. A re-run now resets the stage machine and re-executes every phase; reprint makes each phase incremental (files delta sync, full DB re-download + re-apply with the idempotent dump). Details: - Persist a hasCompletedOnce flag in pull.json so the non-empty site-directory guard is skipped on re-pulls (the directory holds the previous pull's output) and drop the cached preflight so connectivity and the secret-rotation retry path are re-verified. - Re-apply the stored admin credentials over the running site's admin API after the database refresh: db-apply rebuilds the DB from the remote dump, wiping the local admin user and the studio_admin_username option that /studio-auto-login depends on. The server only re-applies credentials at startup, and a re-pull deliberately skips restarting a running server. A connection failure on that request doubles as a health check — if the daemon's "online" view is stale, fall through to a server start. - Reconcile pull.json with the live site record in ensurePort() before the DB stages: if the site record was re-created with a different id/port, db-apply would rewrite the database URLs to a port nothing serves. - Drop /tmp/reprint/* mounts with missing host paths when loading the persisted start-options.json: reprint state files mounted for the remote-uploads proxy are transient, and mounting a missing path crashed server starts with ENOENT. - Fix shouldRestartFilesSyncIndex() comparing against the legacy 'files-sync' command name; reprint canonicalizes to 'files-pull' when saving state, so the check was dead code. Requires the reprint-side refresh support (WordPress/reprint#257) in the bundled reprint.phar.
05d8026 to
19388cd
Compare
📊 Performance Test ResultsComparing dc16064 vs trunk app-size
site-editor
site-startup
Results are median values from multiple test runs. Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff) |
…ta-resync # Conflicts: # apps/cli/commands/pull-reprint.ts
…print-delta-resync
A delta re-pull re-runs files-sync/db-sync/db-apply against a state dir where the previous pull left them "complete". Re-running as-is either throws (files-sync rejects a --filter change while a sync still looks in progress) or silently no-ops, so the database never refreshes. Issue `<command> --abort` before each on a re-pull to reset the sub-command state so every phase re-runs: files-sync keeps the local index and deltas, db-sync re-downloads, db-apply re-applies. Gated on hasCompletedOnce (the persisted "this pull finished once" flag) rather than the local isRepull, because the first re-pull attempt already reset the stage to "initialized"; a retry must still abort. Each command is aborted only while its stage is still pending, so a mid-pull resume never wipes a phase that already re-ran. Runs after preflight, which files-sync/db-sync --abort require. This removes the dependency on reprint's sub-commands auto-refreshing, so the re-pull works against a stock reprint build.
There was a problem hiding this comment.
Pull request overview
This PR changes the studio pull-reprint orchestration so re-running a previously completed pull performs a real delta re-pull (re-entering reprint phases, clearing “complete” subcommand state via --abort, and re-running DB refresh), rather than exiting early without syncing remote changes.
Changes:
- Reset completed pull sessions back to
initializedon re-run, clear cached preflight, and abort completed reprint subcommands so files re-sync incrementally and DB refresh re-applies. - Improve resilience around reprint state/version differences and stale runtime mounts (preventing ENOENT crashes on server start).
- Add Vitest coverage for delta re-pull reset behavior, exporter re-enable on preflight retry, and admin-credentials re-apply behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| apps/cli/lib/wordpress-server-manager.ts | Filters out stale /tmp/reprint/* mounts from persisted start options to avoid ENOENT crashes. |
| apps/cli/lib/pull/reprint-state.ts | Accepts both files-sync and canonicalized files-pull when interpreting reprint state. |
| apps/cli/commands/pull-reprint.ts | Implements delta re-pull flow (stage reset, preflight refresh, --abort state clearing), updates port adoption logic, and re-applies admin credentials when reusing a running server. |
| apps/cli/commands/tests/pull-reprint.test.ts | Adds tests covering delta re-pull behavior and recovery paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
On a re-pull where the daemon process is still up, the site-started "already running" branch skipped updateSiteLatestCliPid and updateSiteAutoStart; both only ran on the start branch. The running-status check requires site.latestCliPid to equal the live process pid, so a stale or missing pid (e.g. after ensurePort adopts a re-created site record) made the site report as stopped. Mirror the start branch and `studio site start`'s already-running path: refresh latestCliPid when online and keep autoStart enabled.
| const isRepull = studioMetadata.stage === 'completed'; | ||
| if ( isRepull ) { | ||
| studioMetadata.stage = 'initialized'; | ||
| studioMetadata.hasCompletedOnce = true; |
There was a problem hiding this comment.
Would it be helpful if reprint exposed this information? Such as php reprint.php import-metadata | jq '.hasCompletedOnce'? It's already stored there so that would be less state and less opportunity for one of these values to go out of sync.
There was a problem hiding this comment.
Yes, I think that would be better as reprint "knows better" about the pull, and reducing the state to the minimum seems a good option.
|
|
||
| // Re-verify connectivity (and give the secret-rotation retry path | ||
| // a chance to run) instead of trusting the cached preflight from | ||
| // the original pull, which may be days old. |
There was a problem hiding this comment.
Did a cached preflight come up in your testing? In what way did it cause trouble?
There was a problem hiding this comment.
I had errors for sites that were pulled some time before that were "missing multipart boundary" due to HTML being returned. The way to reproduce that is to pull a site, then wait for 60 minutes (or reduce the sliding window), and then pull again
| // Rotating the secret does not bump `reprint_exporter_enabled`, so | ||
| // re-open the gate explicitly; otherwise the retry hits the same | ||
| // closed window and ?reprint-api falls through to an HTML page. | ||
| await enableReprintExporter( |
|
The code change mostly makes sense, thank you @epeicher! I'm only confused by one thing: If we're moving to reliance on |
As part of this PR, I have not moved to relying on |
Replace the per-sub-command orchestration (downloadEssentialSiteFiles, refreshFlattenedSiteDirectory, downloadRemoteDatabase, applyDownloadedDatabase, generateRuntimeConfiguration) and the clearCompletedSubcommandState/--abort delta-reset with one `reprint pull` call. reprint owns the pipeline ordering (files-pull -> db-pull -> db-apply -> flat-docroot -> apply-runtime) and resets its own state for a delta re-pull via prepare_repull(), so the Studio-side --abort wiring and per-phase stage gating go away. - runFullPull() issues the single pull with the same sqlite geometry the old db-apply used (target sqlite under the raw content dir) plus --flatten-to, --runtime=playground-cli, --start-runtime=none and --output-dir, mounting the site + runtime dirs up front. - ensurePort moves before the pull so --new-site-url is available. - Collapse the stage machine from 9 stages to 5 (initialized -> pulled -> site-registered -> site-started -> completed). - Bump the PHP-WASM memory_limit to 1024M: the single long-lived fork holds the file-index high-water-mark across phases. Requires reprint's flatten_to->flat_document_root bridge and the re-pull filter-guard fix (WordPress/reprint#268).
Drop the historic 'replaces the former clearCompletedSubcommandState' framing from the pull comment and the runFullPull docblock; describe what the single-pull flow does now.
|
The latest changes 0b27bdf, use a composite The changes require WordPress/reprint#268 to be applied and generate a |
Related issues
flatten_to->flat_document_rootbridge plus the re-pull filter-guard fix that this flow depends on).How AI was used in this PR
Claude Code measured the single-fork memory profile against a live WP.com Atomic site (26k files / ~570 MB), implemented the migration and tests, and verified the full flow end-to-end (fresh pull plus delta re-pull: remote edit -> re-pull -> local content / auto-login checks). Each change was reviewed and re-tested live.
Proposed Changes
Re-running
studio pull-reprinton a completed pull previously printed "pulled successfully" and exited without contacting the remote, so remote edits, inserts, and deletions never reached the local site. With this PR a re-run performs a delta re-pull: files re-sync incrementally and the database is refreshed so existing-row edits and deletions propagate.Studio now drives reprint with a single
reprint pullinstead of orchestrating the pipeline itself. One call runs preflight -> files-pull -> db-pull -> db-apply -> flat-docroot -> apply-runtime in one PHP-WASM fork, and reprint resets its own state for a delta re-pull viaprepare_repull(). This deletes the former Studio orchestration: the five per-sub-command functions, the nine-stage machine, and the--abortdelta-reset wiring (about 340 lines net). The single fork'smemory_limitis raised to 1024M because the file-index high-water-mark now persists across phases (measured peak ~510M on the test site).Fixes that remain part of making re-pulls real:
adminuser andstudio_admin_usernameoption that/studio-auto-logindepends on. The pull re-applies the stored credentials over the running site's admin API, and a connection failure on that request doubles as a health check that catches a stale daemon "online" state and falls back to starting the server.--new-site-url) targets the right port.start-options.json(previously ENOENT crashed the start).Bundle the reprint build via
REPRINT_VERSIONinscripts/download-wp-server-files.ts; it must include WordPress/reprint#268.Testing Instructions
composer build:pharon the reprint repo.reprint.pharatwp-files/reprint/reprint.pharand runnpm run cli:build.node apps/cli/dist/cli/main.mjs pull-reprint --verbose --url <site>and let it complete. It should invoke a singlereprint pull(visible with--verbose) and start a working site.studio site stop/site start). No ENOENT mount errors.Pre-merge Checklist