Refresh on re-run for completed pull sub-commands#260
Conversation
Re-invoking files-pull, db-pull, or db-apply on a completed state now performs a refresh instead of erroring or no-oping, matching the composite pull orchestrator's re-pull behavior: - files-pull: runs a delta sync — keeps the local index, re-indexes the remote, diffs, and fetches only changes. The stale remote index and download lists are cleared first (download_remote_index() appends when the file exists). - db-pull: full refresh via the new ImportClient::reset_db_pull_state() (also reused by Pull::prepare_repull()) — resets DB state, deletes db.sql, .import-domains.json, and the .sql-buffer crash-recovery file, then re-downloads the entire dump. - db-apply: re-applies the dump from statement 0. The dump is idempotent (DROP TABLE IF EXISTS + full INSERTs), so remote edits, inserts, and deletes all propagate. This unblocks callers that drive the sub-commands directly (notably Studio's pull-reprint) and could previously never re-sync a database: edits and deletions to existing rows stayed frozen at the first completed pull. Also fixed along the way: - --filter=skipped-earlier runs never marked status=complete, leaving the state stuck in_progress after a successful pull. The branch now marks completion, and a legacy-state repair in run() heals sessions written by older versions. - The skipped download list is now truncated instead of deleted everywhere. apply-runtime mounts that file into the generated runtime for the remote-uploads proxy, and PHP-WASM re-resolves mounts on every runtime rotation — deleting the file crashed server starts and WP-CLI with ENOENT. - db-apply strips _edit_lock postmeta after applying. The dump carries the remote's ephemeral editor locks, which render as phantom "X is currently editing" badges on the imported copy. - A stale .sql-buffer could be replayed into the target on refresh (latent prepare_repull bug, fixed by the shared reset helper). In-progress/partial states still resume from their cursors; --abort remains the way to clear state without running.
Three E2E tests asserted the old "re-run refuses / deletes" contract that this branch changes; update them to verify the new behavior: - import-02: re-running db-sync on a completed state now performs a full refresh (exit 0, fresh db.sql, audit log records the refresh) instead of erroring with a "--abort" message. - import-40: the skipped download list is truncated to empty rather than deleted (apply-runtime mounts it; deletion crashed the runtime with ENOENT). Assert it exists and is zero-length. - import-01: a completed files-sync re-run now performs a delta sync; rename the test and assert the completed status it ends on (it was passing only incidentally under the old "refuses" name). Verified in the Dockerized E2E environment: all three files pass (31 tests). The full suite's only remaining failures (import-07, import-19) are pre-existing local-Docker permission/fault-injection flakiness — they fail identically on trunk and pass on CI.
Pull pipeline performance —
|
| Stage | PR | trunk | Δ | Status | Details |
|---|---|---|---|---|---|
playground-sqlite-db-pull |
9.38 s | 9.84 s | ⚪ -455 ms (-4.6%) | ✓ | condition=db-pull in PHP.wasm runtime=php.wasm 8.3 wp_mysql_parser=enabled mode=lexer native_lexer=verified native_token_stream=WP_MySQL_Native_Token_Stream native_token_count=18 native_parser=selected trunk: condition=db-pull in PHP.wasm runtime=php.wasm 8.3 wp_mysql_parser=enabled mode=lexer native_lexer=verified native_token_stream=WP_MySQL_Native_Token_Stream native_token_count=18 native_parser=selected |
playground-sqlite-db-apply |
3.53 s | 3.51 s | ⚪ +20 ms (+0.6%) | ✓ | condition=db-apply to SQLite in PHP.wasm runtime=php.wasm 8.3 wp_mysql_parser=enabled mode=parser native_lexer=verified native_token_stream=WP_MySQL_Native_Token_Stream native_token_count=18 native_parser=verified native_ast=WP_MySQL_Native_Parser_Node sqlite_driver_parser=verified trunk: condition=db-apply to SQLite in PHP.wasm runtime=php.wasm 8.3 wp_mysql_parser=enabled mode=parser native_lexer=verified native_token_stream=WP_MySQL_Native_Token_Stream native_token_count=18 native_parser=verified native_ast=WP_MySQL_Native_Parser_Node sqlite_driver_parser=verified |
| Total | 12.92 s | 13.35 s | ⚪ -435 ms (-3.3%) |
Numbers carry runner noise; treat single-run deltas as directional, not authoritative.
📈 Trunk performance history — commit-by-commit timeline.
|
|
||
| $this->state = $this->load_state(); | ||
|
|
||
| // Legacy state repair: older versions never marked a |
There was a problem hiding this comment.
No need to document what older versions did inline. It only matters for reviewing this PR but will not matter once we merge it. Let's just document what this does now.
| !file_exists($this->skipped_download_list_file) | ||
| ) { | ||
| $this->audit_log( | ||
| "STATE REPAIR | skipped-earlier run finished but was never marked complete — marking complete", |
There was a problem hiding this comment.
Shouldn't we mark it as complete right before deleting the skipped_download_list_file? As this stands now, a hypothetical premature deletion of the skipped_download_list_file would keep reprint running where I'd expect it to stop and communicate the error.
| @@ -10188,6 +10270,68 @@ function ($ch, $dl_total, $dl_now, $ul_total, $ul_now) { | |||
| * Reset state to defaults while preserving cross-command data like | |||
| * preflight results, version, and follow_symlinks. | |||
| */ | |||
| /** | |||
There was a problem hiding this comment.
Note the "Reset state" comment got misplaced now. There's also a misplaced comment before it already, let's find its home or delete it.
| @unlink($path); | ||
| } | ||
| } | ||
| // Truncated, not deleted: the generated runtime may hold a mount |
There was a problem hiding this comment.
What does that mean? Is it for running in Studio? If so, let's clearly explain that scenario in the doc string. What's the setup? What's the problem? How is this solving the problem?
There was a problem hiding this comment.
Pull request overview
This PR changes the sub-command re-run behavior so that invoking files-pull, db-pull, or db-apply on a completed state now performs a refresh (delta sync / full re-dump / re-apply) instead of erroring with "use --abort." This aligns sub-command behavior with the composite pull orchestrator's re-pull semantics and fixes several related bugs along the way.
Changes:
- Sub-commands (
files-pull,db-pull,db-apply) now detect a completed state and re-run as a refresh (delta or full), with shared helpers (reset_db_pull_state(),clear_skipped_download_list()) extracted to avoid duplication. - Fixes for the
--filter=skipped-earliercompletion bug (never marked complete), truncation-vs-deletion for the skipped download list (prevents ENOENT in running runtimes), phantom_edit_lockbadges (stripped after apply), and stale.sql-bufferreplay. - Comprehensive test coverage added: new
DbResyncStateTest.phpwith round-trip, edit-lock stripping, db-pull reset, and legacy state repair tests; updatedFilesSyncStateTest.phpand e2e tests to validate the new refresh behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
packages/reprint-importer/src/import.php |
Core implementation: legacy state repair, delta/refresh paths for all three sub-commands, clear_skipped_download_list() and reset_db_pull_state() helpers, strip_ephemeral_edit_locks(), and CLI help updates |
packages/reprint-importer/src/lib/pull/class-pull.php |
Refactored prepare_repull() to use the shared reset_db_pull_state() helper |
tests/Import/DbResyncStateTest.php |
New test class covering db-apply re-run, edit-lock stripping, db-pull reset, and legacy state repair |
tests/Import/FilesSyncStateTest.php |
Updated test for completed files-pull re-run (delta instead of refusal) |
tests/e2e/tests/import-01-basic-file-sync.test.js |
Updated e2e test: re-run now expects successful delta sync |
tests/e2e/tests/import-02-sql-sync.test.js |
Updated e2e test: re-run expects successful refresh |
tests/e2e/tests/import-40-defer-uploads.test.js |
Updated e2e test: skipped list validated as truncated (not deleted) |
CLAUDE.md |
Documents the new re-run/refresh semantics |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * Empty the skipped-files download list by truncating it in place | ||
| * rather than deleting it. | ||
| * | ||
| * apply-runtime mounts this file into the generated runtime — the | ||
| * temporary remote-uploads proxy reads it to decide which paths are | ||
| * still remote-only. A running server re-resolves its mounts on every | ||
| * PHP runtime rotation (and on restart from a persisted config), so | ||
| * deleting the file breaks the mount with ENOENT. An empty file means | ||
| * "no remote-only files left", which is exactly what every caller | ||
| * wants — all has-skipped checks test for size > 0. | ||
| */ | ||
| public function clear_skipped_download_list(string $reason): void | ||
| { | ||
| if (!file_exists($this->skipped_download_list_file)) { | ||
| return; | ||
| } | ||
| file_put_contents($this->skipped_download_list_file, ""); | ||
| $this->audit_log( | ||
| "FILE TRUNCATE | {$this->skipped_download_list_file} | {$reason}", | ||
| ); | ||
| } |
| $this->save_state($this->state); | ||
| $this->run_files_sync_pipeline(); | ||
|
|
||
| // Pipeline returns early with partial status if interrupted. |
There was a problem hiding this comment.
How would we run into this? I'm a bit confused and the comment doesn't tell me what are we guarding against. What is an interruption in this comment?
There was a problem hiding this comment.
Aha, I think I get it now. I'll brush up these comments.
| // the fresh-start path below, where $is_delta picks up the local | ||
| // index and turns the run into re-index → diff → fetch changes. | ||
| $this->audit_log( | ||
| sprintf("files-pull already complete: %d files indexed%s", $index_size, $skipped_note), |
There was a problem hiding this comment.
this is valuable audit log information, let's keep it
|
With this PR, it's no longer safe to re-run completed commands I understand the problem you're solving here is refreshing the local database from the remote site? Wouldn't calling |
Agreed, good catch. You're right that I'll back out the implicit restart and keep the refresh behind |
|
Additionally, I can create follow-up PRs for the following identified issues:
Then the refresh PR (flag or |
|
Closing this in favor of the following PRs:
And the most important one is the changes in Studio: A couple of bugs found during this PR are not real issues so they have been closed: |
Problem
Re-pulling a site never propagated edits or deletions to existing database rows — the local database stayed frozen at the content captured by the first completed pull. The root cause is an asymmetry in the engine: the composite
pullorchestrator resets state for a delta re-pull (Pull::prepare_repull()), but callers that drive the sub-commands directly (notably Studio'spull-reprint) had no refresh path:files-pull--abort")db-pulldb-apply--abortis destructive (re-downloads everything), so there was no clean incremental re-sync.Change
Re-invoking a sub-command on a completed state now performs a refresh, matching the composite
pull's re-pull semantics:files-pull→ delta sync. Keeps the local index, re-indexes the remote, diffs, fetches only changes. Stale remote index / download lists are cleared first (download_remote_index()appends when the file exists).db-pull→ full refresh. NewImportClient::reset_db_pull_state()(now also used byPull::prepare_repull()) resets DB state and deletesdb.sql,.import-domains.json, and the.sql-buffercrash-recovery file, then re-downloads the entire dump.db-apply→ re-apply from statement 0. The dump is idempotent (DROP TABLE IF EXISTS+ full INSERTs), so edits, inserts, and deletes all land.In-progress/partial states still resume from their cursors;
--abortstill clears state without running.Bugs fixed along the way
--filter=skipped-earliernever marked completion — the branch returned before thestatus = "complete"marking, leaving every successful essential+skipped pull stuckin_progressforever. Fixed, plus a legacy-state repair inrun()for sessions written by older versions.apply-runtimemountsstate/.import-download-list-skipped.jsonlinto the generated runtime (remote-uploads proxy), and PHP-WASM re-resolves mounts on every runtime rotation. Deleting the file crashed server starts and WP-CLI with ENOENT. It is now truncated to empty instead (all has-skipped checks testsize > 0, so semantics are unchanged)._edit_lockpostmeta, which renders as an editor lock on the imported copy for 150s after the captured timestamp.db-applynow strips_edit_lockrows after applying (WordPress regenerates them locally; core's WXR exporter skips them too)..sql-bufferreplay — a leftover crash-recovery buffer could be replayed into the target on a refresh (latentprepare_repull()gap, fixed via the shared reset helper).Known limitation
A table dropped on the remote between pulls lingers in the local target (the new dump no longer mentions it). WP core tables always exist, so this is rare; recreating the SQLite file on refresh is a possible hardening follow-up.
Testing
tests/Import/DbResyncStateTest.php: round-trip via the sub-command path proving an edited, inserted, and deleted row all propagate on re-run; db-pull refresh resets state and deletes stale artifacts; the stuck skipped-earlier state is repaired;_edit_lockis stripped while other postmeta survives.tests/Import/FilesSyncStateTest.phpupdated: completedfiles-pullre-runs as a delta and clears the stale remote index.tests/Import/+tests/UrlRewriting/suites green (493 tests).Supersedes #257 — moved from a fork to an in-repo branch so CI runs with full token permissions (the Performance Report perf-comment step can post on this PR).