Skip to content

Batch SQLite db-apply transactions#248

Draft
adamziel wants to merge 3 commits into
trunkfrom
codex/opt-tx-batching-20260522010757
Draft

Batch SQLite db-apply transactions#248
adamziel wants to merge 3 commits into
trunkfrom
codex/opt-tx-batching-20260522010757

Conversation

@adamziel

@adamziel adamziel commented May 21, 2026

Copy link
Copy Markdown
Collaborator

What it does

Batches the SQLite db-apply path inside one simple wrapper-transaction model. SQLite now opens an import-owned transaction, commits every 500 applied statements, and writes _reprint_db_apply_progress in the same transaction as the target data.

Rationale

The previous version tried to support both source dump transaction boundaries and importer batch transactions. That made the PR much larger: separate statement classification, DDL/SET flushing, transaction state tracking, and helper methods for a path that only exists to speed up SQLite imports.

This version goes all-in on importer-owned batches. MySQL dump transaction-control statements are counted for resume offsets but skipped on SQLite, because executing START TRANSACTION, COMMIT, ROLLBACK, LOCK, UNLOCK, SAVEPOINT, or RELEASE would let the MySQL-on-SQLite wrapper commit or roll back behind the importer’s progress marker.

The native MySQL parser does not make this change irrelevant. It speeds up parser/translation work inside MySQL-on-SQLite. This PR reduces SQLite transaction/progress-commit overhead. Those are different layers. With the native parser enabled, and with CI’s small fixture, less wall time remains for batching to remove, so the reported delta is smaller.

Implementation

The SQLite path now does the minimum needed for atomic batched resume:

  • Creates _reprint_db_apply_progress once before streaming db.sql.
  • Reconciles JSON state from that in-target marker on resume.
  • Starts a SQLite wrapper transaction before the first applied statement in each batch.
  • Commits marker + data every SQLITE_DB_APPLY_BATCH_SIZE statements and at the end of the stream.
  • Rolls back the active batch on execution errors.
  • Uses a gated WP_MySQL_Lexer check only for possible transaction-control statements; normal INSERT statements avoid tokenization.

Compared with the prior PR head, the consolidation removes the statement-classification helpers and deletes 481 lines while adding 209 lines for the simplified code and adjusted adversarial coverage.

CI performance report for this head SHA uses the small fixture (10,000 posts / 25,000 postmeta) with the native parser extension enabled:

Run Stage PR trunk Delta
CI bot comment playground-sqlite-db-apply 2.62 s 2.82 s -201 ms / -7.1%
CI bot comment total reported stages 9.88 s 10.18 s -294 ms / -2.9%

The larger local focused benchmark below uses the default large fixture (320,007 posts / 720,015 postmeta) with no native parser extension. It is not expected to match the bot comment; it isolates the batching payoff under a much larger statement count:

flock /tmp/reprint-bench.lock -c 'BENCH_STAGES=playground-sqlite-db-apply BENCH_LABEL=branch-rebased BENCH_JSON_OUT=/home/claude/reprint-opt-tx-batching-20260522010757/.context/bench-branch-rebased.json BENCH_MD_OUT=/home/claude/reprint-opt-tx-batching-20260522010757/.context/bench-branch-rebased.md node tests/e2e/benchmark/bench-pull.mjs 2>&1 | tee /home/claude/reprint-opt-tx-batching-20260522010757/.context/bench-branch-rebased.log'

flock /tmp/reprint-bench.lock -c 'IMPORTER_PATH=/home/claude/reprint-opt-tx-batching-20260522010757/.context/trunk-current-src-27c5f25/importer/import.php BENCH_STAGES=playground-sqlite-db-apply BENCH_LABEL=trunk-current BENCH_JSON_OUT=/home/claude/reprint-opt-tx-batching-20260522010757/.context/bench-trunk-current.json BENCH_MD_OUT=/home/claude/reprint-opt-tx-batching-20260522010757/.context/bench-trunk-current.md node tests/e2e/benchmark/bench-pull.mjs 2>&1 | tee /home/claude/reprint-opt-tx-batching-20260522010757/.context/bench-trunk-current.log'
Run Stage Wall time Attempts Delta
current origin/trunk playground-sqlite-db-apply 94.25 s 1 baseline
this branch playground-sqlite-db-apply 67.49 s 1 -26.77 s / -28.4%

Testing instructions

php -l packages/reprint-importer/src/import.php
php -l tests/Import/SqliteDbApplyBatchingTest.php
vendor/bin/phpunit --testdox tests/Import/SqliteDbApplyBatchingTest.php
vendor/bin/phpunit --testdox tests/Import/NewSiteUrlSqliteTest.php tests/Import/DeactivateHostPluginsTest.php
vendor/bin/phpstan analyze --memory-limit=1G --no-progress packages/reprint-importer/src/import.php
git diff --check origin/trunk...HEAD

@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Pull pipeline performance — large-directory

Site: large-directory · 2,000+ plus targeted file-transfer scenarios files · 10,000 posts · 25,000 postmeta · PHP 8.5.6

Stage PR trunk Δ Status Details
playground-sqlite-db-pull 7.26 s 7.36 s ⚪ -93 ms (-1.3%) condition=db-pull in PHP.wasm
runtime=php.wasm 8.3
wp_mysql_parser=enabled
mode=lexer
native_lexer=verified
native_token_stream=WP_MySQL_Native_Token_Stream
native_token_count=18
native_parser=selected
trunk: condition=db-pull in PHP.wasm
runtime=php.wasm 8.3
wp_mysql_parser=enabled
mode=lexer
native_lexer=verified
native_token_stream=WP_MySQL_Native_Token_Stream
native_token_count=18
native_parser=selected
playground-sqlite-db-apply 2.62 s 2.82 s ⚪ -201 ms (-7.1%) condition=db-apply to SQLite in PHP.wasm
runtime=php.wasm 8.3
wp_mysql_parser=enabled
mode=parser
native_lexer=verified
native_token_stream=WP_MySQL_Native_Token_Stream
native_token_count=18
native_parser=verified
native_ast=WP_MySQL_Native_Parser_Node
sqlite_driver_parser=verified
trunk: condition=db-apply to SQLite in PHP.wasm
runtime=php.wasm 8.3
wp_mysql_parser=enabled
mode=parser
native_lexer=verified
native_token_stream=WP_MySQL_Native_Token_Stream
native_token_count=18
native_parser=verified
native_ast=WP_MySQL_Native_Parser_Node
sqlite_driver_parser=verified
Total 9.88 s 10.18 s ⚪ -294 ms (-2.9%)

Numbers carry runner noise; treat single-run deltas as directional, not authoritative.

📈 Trunk performance history — commit-by-commit timeline.

@adamziel adamziel force-pushed the codex/opt-tx-batching-20260522010757 branch from 89a7e31 to 2201c8d Compare May 22, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant