Skip to content

Truncate the skipped-download list instead of deleting it#264

Open
epeicher wants to merge 2 commits into
trunkfrom
fix/truncate-skipped-download-list
Open

Truncate the skipped-download list instead of deleting it#264
epeicher wants to merge 2 commits into
trunkfrom
fix/truncate-skipped-download-list

Conversation

@epeicher

@epeicher epeicher commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

What it does

A pulled site no longer crashes with ENOENT after the deferred uploads (--filter=skipped-earlier) tail finishes downloading. The importer now empties the skipped-download list by truncating it in place instead of deleting it.

Rationale

apply-runtime mounts the skipped-download list (.import-download-list-skipped.jsonl) into the generated runtime — the temporary remote-uploads proxy reads it to decide which wp-content/uploads paths are still remote-only. Several cleanup paths in the importer deleted that file with unlink().

PHP-WASM re-resolves its mounts on every runtime rotation (and on restart from a persisted config). Once the file was gone, the mount resolved to a missing path and the running site crashed with ENOENT — both the server start and WP-CLI failed. The crash surfaced right after the uploads tail completed, which is where the list was being deleted.

Implementation

Replaced every deletion of the list with a new ImportClient::clear_skipped_download_list() that truncates the file to empty in place:

public function clear_skipped_download_list(string $reason): void
{
    if (!file_exists($this->skipped_download_list_file)) {
        return;
    }
    file_put_contents($this->skipped_download_list_file, "");
    $this->audit_log("FILE TRUNCATE | {$this->skipped_download_list_file} | {$reason}");
}

An empty file means "no remote-only files left", which is exactly what every has_skipped check already wants — they all test filesize(...) > 0. So the runtime mount stays valid and no caller behavior changes. Updated the four call sites in import.php (abort files-pull, clearing before diff stage, no skipped files to fetch, skipped files fetch complete) and the one in Pull::prepare_repull().

Testing instructions

The easiest way to reproduce the issue is to use Studio code:

  1. Navigate to Studio repo root and run:
export STUDIO_ENABLE_PULL_REPRINT=true && node ~/github/studio/apps/cli/dist/cli/main.mjs pull-reprint --url <your_wpcom_site>
  1. Reply Y to create a new Studio site
  2. Wait for the site to be created, then open Studio and stop and start the created site
  3. Observe that there is an error CleanShot 2026-06-17 at 18 32 13@2x
  4. Apply the changes on this branch and compile them by running composer build:phar on this repo
  5. Copy the reprint.phar to the following folder on Studio cp reprint.phar <studio_repo>/wp-files/reprint
  6. On Studio, run npm run cli:build so the phar file is copied to the dist folder
  7. Delete the previously created Studio site so you can start from scratch and previous pulls are deleted
  8. Repeat steps 1-3 and verify that the site starts successfully

apply-runtime mounts .import-download-list-skipped.jsonl into the
generated runtime (the temporary remote-uploads proxy reads it to decide
which paths are still remote-only). PHP-WASM re-resolves its mounts on
every runtime rotation and on restart from a persisted config, so
unlink-ing the file crashes the running site with ENOENT.

Replace every deletion of the skipped-download list with a new
ImportClient::clear_skipped_download_list() helper that truncates the
file to empty in place. An empty file means "no remote-only files left",
which is exactly what every has-skipped check wants (they all test for
size > 0), so the mount stays valid and no caller behavior changes.

Updated the four call sites in import.php (abort, clearing before diff,
no skipped to fetch, skipped fetch complete) and the one in
Pull::prepare_repull().
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Pull pipeline performance — large-directory

Site: large-directory · 2,000+ plus targeted file-transfer scenarios files · 10,000 posts · 25,000 postmeta · PHP 8.5.7

Stage PR trunk Δ Status Details
playground-sqlite-db-pull 8.88 s 8.49 s ⚪ +391 ms (+4.6%) condition=db-pull in PHP.wasm
runtime=php.wasm 8.3
wp_mysql_parser=enabled
mode=lexer
native_lexer=verified
native_token_stream=WP_MySQL_Native_Token_Stream
native_token_count=18
native_parser=selected
trunk: condition=db-pull in PHP.wasm
runtime=php.wasm 8.3
wp_mysql_parser=enabled
mode=lexer
native_lexer=verified
native_token_stream=WP_MySQL_Native_Token_Stream
native_token_count=18
native_parser=selected
playground-sqlite-db-apply 3.44 s 3.51 s ⚪ -68 ms (-1.9%) condition=db-apply to SQLite in PHP.wasm
runtime=php.wasm 8.3
wp_mysql_parser=enabled
mode=parser
native_lexer=verified
native_token_stream=WP_MySQL_Native_Token_Stream
native_token_count=18
native_parser=verified
native_ast=WP_MySQL_Native_Parser_Node
sqlite_driver_parser=verified
trunk: condition=db-apply to SQLite in PHP.wasm
runtime=php.wasm 8.3
wp_mysql_parser=enabled
mode=parser
native_lexer=verified
native_token_stream=WP_MySQL_Native_Token_Stream
native_token_count=18
native_parser=verified
native_ast=WP_MySQL_Native_Parser_Node
sqlite_driver_parser=verified
Total 12.32 s 12.00 s ⚪ +323 ms (+2.7%)

Numbers carry runner noise; treat single-run deltas as directional, not authoritative.

📈 Trunk performance history — commit-by-commit timeline.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a crash (ENOENT) that occurred after the deferred uploads download completed. The root cause was that the importer deleted the .import-download-list-skipped.jsonl file via unlink(), but apply-runtime mounts this file into the generated PHP-WASM runtime. When the file was gone, the mount broke on the next runtime rotation or restart, crashing both the server and WP-CLI.

Changes:

  • Added a new ImportClient::clear_skipped_download_list() method that truncates the file to empty instead of deleting it, keeping the mount valid while signalling "no remote-only files left."
  • Replaced all four unlink call sites in import.php and one in Pull::prepare_repull() with calls to the new method.
  • Added a unit test (ClearSkippedDownloadListTest) and updated the e2e test to assert the file is truncated (exists with size 0) rather than deleted.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
packages/reprint-importer/src/import.php Adds clear_skipped_download_list() method; replaces four unlink call sites with it
packages/reprint-importer/src/lib/pull/class-pull.php Replaces unlink of skipped list in prepare_repull() with the new truncation method
tests/Import/ClearSkippedDownloadListTest.php New unit test covering truncation and no-op-when-absent behaviors
tests/e2e/tests/import-40-defer-uploads.test.js Updates assertion from "file deleted" to "file exists and is empty"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/reprint-importer/src/import.php Outdated
@adamziel

Copy link
Copy Markdown
Collaborator

This sounds like a bug in PHP-WASM where the runtime rotation should tolerate missing files:

PHP-WASM re-resolves its mounts on every runtime rotation (and on restart from a persisted config). Once the file was gone, the mount resolved to a missing path and the running site crashed with ENOENT — both the server start and WP-CLI failed. The crash surfaced right after the uploads tail completed, which is where the list was being deleted.

@adamziel

Copy link
Copy Markdown
Collaborator

I'm exploring a fix here WordPress/wordpress-playground#3811

@epeicher

Copy link
Copy Markdown
Collaborator Author

This sounds like a bug in PHP-WASM where the runtime rotation should tolerate missing files:

I'm exploring a fix here WordPress/wordpress-playground#3811

Thanks! That makes sense to me, I agree this might look like an escape hatch for that scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants