Skip to content

Refresh on re-run for completed pull sub-commands#257

Closed
epeicher wants to merge 3 commits into
WordPress:trunkfrom
epeicher:delta-resync-refresh
Closed

Refresh on re-run for completed pull sub-commands#257
epeicher wants to merge 3 commits into
WordPress:trunkfrom
epeicher:delta-resync-refresh

Conversation

@epeicher

@epeicher epeicher commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Problem

Re-pulling a site never propagated edits or deletions to existing database rows — the local database stayed frozen at the content captured by the first completed pull. The root cause is an asymmetry in the engine: the composite pull orchestrator resets state for a delta re-pull (Pull::prepare_repull()), but callers that drive the sub-commands directly (notably Studio's pull-reprint) had no refresh path:

Sub-command Re-run on completed state (before)
files-pull Silent no-op ("already complete… use --abort")
db-pull Throws "already completed … Use --abort flag to start over."
db-apply Throws "already completed. Use --abort flag to re-run."

--abort is destructive (re-downloads everything), so there was no clean incremental re-sync.

Change

Re-invoking a sub-command on a completed state now performs a refresh, matching the composite pull's re-pull semantics:

  • files-pull → delta sync. Keeps the local index, re-indexes the remote, diffs, fetches only changes. Stale remote index / download lists are cleared first (download_remote_index() appends when the file exists).
  • db-pull → full refresh. New ImportClient::reset_db_pull_state() (now also used by Pull::prepare_repull()) resets DB state and deletes db.sql, .import-domains.json, and the .sql-buffer crash-recovery file, then re-downloads the entire dump.
  • db-apply → re-apply from statement 0. The dump is idempotent (DROP TABLE IF EXISTS + full INSERTs), so edits, inserts, and deletes all land.

In-progress/partial states still resume from their cursors; --abort still clears state without running.

Bugs fixed along the way

  1. --filter=skipped-earlier never marked completion — the branch returned before the status = "complete" marking, leaving every successful essential+skipped pull stuck in_progress forever. Fixed, plus a legacy-state repair in run() for sessions written by older versions.
  2. Deleting the skipped download list broke running runtimesapply-runtime mounts state/.import-download-list-skipped.jsonl into the generated runtime (remote-uploads proxy), and PHP-WASM re-resolves mounts on every runtime rotation. Deleting the file crashed server starts and WP-CLI with ENOENT. It is now truncated to empty instead (all has-skipped checks test size > 0, so semantics are unchanged).
  3. Phantom "X is currently editing" badges — the dump carries the remote's ephemeral _edit_lock postmeta, which renders as an editor lock on the imported copy for 150s after the captured timestamp. db-apply now strips _edit_lock rows after applying (WordPress regenerates them locally; core's WXR exporter skips them too).
  4. Stale .sql-buffer replay — a leftover crash-recovery buffer could be replayed into the target on a refresh (latent prepare_repull() gap, fixed via the shared reset helper).

Known limitation

A table dropped on the remote between pulls lingers in the local target (the new dump no longer mentions it). WP core tables always exist, so this is rare; recreating the SQLite file on refresh is a possible hardening follow-up.

Testing

  • New tests/Import/DbResyncStateTest.php: round-trip via the sub-command path proving an edited, inserted, and deleted row all propagate on re-run; db-pull refresh resets state and deletes stale artifacts; the stuck skipped-earlier state is repaired; _edit_lock is stripped while other postmeta survives.
  • tests/Import/FilesSyncStateTest.php updated: completed files-pull re-runs as a delta and clears the stale remote index.
  • Full tests/Import/ + tests/UrlRewriting/ suites green (493 tests).
  • Verified live end-to-end against a WP.com Atomic site via Studio: remote post edits, inserts, and deletes propagate on re-pull; site start and WP-CLI work after sync cycles.

Re-invoking files-pull, db-pull, or db-apply on a completed state now
performs a refresh instead of erroring or no-oping, matching the
composite pull orchestrator's re-pull behavior:

- files-pull: runs a delta sync — keeps the local index, re-indexes the
  remote, diffs, and fetches only changes. The stale remote index and
  download lists are cleared first (download_remote_index() appends
  when the file exists).
- db-pull: full refresh via the new ImportClient::reset_db_pull_state()
  (also reused by Pull::prepare_repull()) — resets DB state, deletes
  db.sql, .import-domains.json, and the .sql-buffer crash-recovery
  file, then re-downloads the entire dump.
- db-apply: re-applies the dump from statement 0. The dump is
  idempotent (DROP TABLE IF EXISTS + full INSERTs), so remote edits,
  inserts, and deletes all propagate.

This unblocks callers that drive the sub-commands directly (notably
Studio's pull-reprint) and could previously never re-sync a database:
edits and deletions to existing rows stayed frozen at the first
completed pull.

Also fixed along the way:

- --filter=skipped-earlier runs never marked status=complete, leaving
  the state stuck in_progress after a successful pull. The branch now
  marks completion, and a legacy-state repair in run() heals sessions
  written by older versions.
- The skipped download list is now truncated instead of deleted
  everywhere. apply-runtime mounts that file into the generated
  runtime for the remote-uploads proxy, and PHP-WASM re-resolves
  mounts on every runtime rotation — deleting the file crashed server
  starts and WP-CLI with ENOENT.
- db-apply strips _edit_lock postmeta after applying. The dump carries
  the remote's ephemeral editor locks, which render as phantom
  "X is currently editing" badges on the imported copy.
- A stale .sql-buffer could be replayed into the target on refresh
  (latent prepare_repull bug, fixed by the shared reset helper).

In-progress/partial states still resume from their cursors; --abort
remains the way to clear state without running.
@epeicher epeicher changed the title Refresh on re-run for completed pull sub-commands [WIP] Refresh on re-run for completed pull sub-commands Jun 10, 2026
@epeicher epeicher marked this pull request as ready for review June 15, 2026 09:04
@epeicher epeicher changed the title [WIP] Refresh on re-run for completed pull sub-commands Refresh on re-run for completed pull sub-commands Jun 15, 2026
Three E2E tests asserted the old "re-run refuses / deletes" contract
that this branch changes; update them to verify the new behavior:

- import-02: re-running db-sync on a completed state now performs a
  full refresh (exit 0, fresh db.sql, audit log records the refresh)
  instead of erroring with a "--abort" message.
- import-40: the skipped download list is truncated to empty rather
  than deleted (apply-runtime mounts it; deletion crashed the runtime
  with ENOENT). Assert it exists and is zero-length.
- import-01: a completed files-sync re-run now performs a delta sync;
  rename the test and assert the completed status it ends on (it was
  passing only incidentally under the old "refuses" name).

Verified in the Dockerized E2E environment: all three files pass
(31 tests). The full suite's only remaining failures (import-07,
import-19) are pre-existing local-Docker permission/fault-injection
flakiness — they fail identically on trunk and pass on CI.
@epeicher

Copy link
Copy Markdown
Collaborator Author

Superseded by #260 — moved from the fork to an in-repo branch so CI runs with full token permissions (the Performance Report perf-comment step couldn't post from a fork PR). Same branch and commits.

@epeicher epeicher closed this Jun 15, 2026
@epeicher epeicher removed the request for review from adamziel June 15, 2026 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant