Skip to content

Conversation

@MarkBerube
Copy link
Contributor

Why

When search-replace-command for WP CLI scans for URLs and other bits of text to replace, it searches the whole SQL database for that value. Even tables in the WP core ecosystem that will never have a URL, columns like:

  • post_status on wp_posts
  • meta_key on wp_postmeta

While not a big deal for most WordPress sites, this becomes a persistent speed bump to cloning environments when you start dealing with databases with a large amount of rows or databases with multisite setups with a huge amount of tables.

What this does

What this change does is add two parameters to alleviate this pain that a WP-CLI user can call.

--smart-url - skip columns automactically that exist in WP Core that will NEVER have a URL as a value. These columns are statically fixed and exist in src/WP_CLI/SearchReplace/Non_URL_Columns.php.

--analyze-tables - can only be executed if --smart-url is also present in parameters. This parameter will tell WP-CLI to scan the database for columns that are non text datatypes in SQL (binary, datetime, etc.) and for column names that match the core WP pattern that would also not have a URL in it (*_order, *_quantity, etc.) that will be skipped additionally before search-replace runs. Will be a bit slower, but will capture more columns to skip for custom WP DB setups.

You must opt into these performance skips via the parameters above and it is only recommended that you do so if you are replacing URLs in the WP DB.

Performance Gains

I've tested this in multiple local setups. In smaller setups (less than 1-2gb DB size) there was no noticeable difference between the scan speeds. However when the DB grew larger in my benchmarks (10GB) where there were a large amount of rows in wp_postmeta and wp_posts there was an average 30-40% savings over a normal search-replace scan speed.

Test Run

Command Time Improvement
wp search-replace (standard) 144s baseline
wp search-replace --smart-url 84s 41.6% faster

Test Details

Standard Command Output

Table	Column	Replacements	Type
wp_commentmeta	meta_key	0	SQL
wp_commentmeta	meta_value	0	SQL
wp_comments	comment_author	0	SQL
wp_comments	comment_author_email	0	SQL
wp_comments	comment_author_url	200000	SQL
wp_comments	comment_author_IP	0	SQL
wp_comments	comment_content	200000	SQL
wp_comments	comment_approved	0	SQL
wp_comments	comment_agent	0	SQL
wp_comments	comment_type	0	SQL
wp_links	link_url	0	SQL
wp_links	link_name	0	SQL
wp_links	link_image	0	SQL
wp_links	link_target	0	SQL
wp_links	link_description	0	SQL
wp_links	link_visible	0	SQL
wp_links	link_rel	0	SQL
wp_links	link_notes	0	SQL
wp_links	link_rss	0	SQL
wp_options	option_name	0	SQL
wp_options	option_value	50000	PHP
wp_options	autoload	0	SQL
wp_postmeta	meta_key	0	SQL
wp_postmeta	meta_value	30000000	SQL
wp_posts	post_content	950000	SQL
wp_posts	post_title	0	SQL
wp_posts	post_excerpt	950000	SQL
wp_posts	post_status	0	SQL
wp_posts	comment_status	0	SQL
wp_posts	ping_status	0	SQL
wp_posts	post_password	0	SQL
wp_posts	post_name	0	SQL
wp_posts	to_ping	0	SQL
wp_posts	pinged	0	SQL
wp_posts	post_content_filtered	0	SQL
wp_posts	guid	950000	SQL
wp_posts	post_type	0	SQL
wp_posts	post_mime_type	0	SQL
wp_term_taxonomy	taxonomy	0	SQL
wp_term_taxonomy	description	0	SQL
wp_termmeta	meta_key	0	SQL
wp_termmeta	meta_value	0	SQL
wp_terms	name	0	SQL
wp_terms	slug	0	SQL
wp_usermeta	meta_key	0	SQL
wp_usermeta	meta_value	0	PHP
wp_users	user_login	0	SQL
wp_users	user_nicename	0	SQL
wp_users	user_email	0	SQL
wp_users	user_url	0	SQL
wp_users	user_activation_key	0	SQL
wp_users	display_name	0	SQL
Success: 33300000 replacements to be made.

real	2m23.679s
user	0m0.820s
sys	0m0.661s

Smart URL Mode Output

Table	Column	Replacements	Type
wp_commentmeta	meta_value	0	SQL
wp_comments	comment_author	0	SQL
wp_comments	comment_author_email	0	SQL
wp_comments	comment_author_url	200000	SQL
wp_comments	comment_author_IP	0	SQL
wp_comments	comment_content	200000	SQL
wp_comments	comment_agent	0	SQL
wp_links	link_url	0	SQL
wp_links	link_name	0	SQL
wp_links	link_image	0	SQL
wp_links	link_target	0	SQL
wp_links	link_description	0	SQL
wp_links	link_notes	0	SQL
wp_options	option_value	50000	PHP
wp_postmeta	meta_value	30000000	SQL
wp_posts	post_content	950000	SQL
wp_posts	post_title	0	SQL
wp_posts	post_excerpt	950000	SQL
wp_posts	post_content_filtered	0	SQL
wp_posts	guid	950000	SQL
wp_term_taxonomy	description	0	SQL
wp_termmeta	meta_value	0	SQL
wp_terms	name	0	SQL
wp_usermeta	meta_value	0	PHP
wp_users	user_url	0	SQL
wp_users	user_activation_key	0	SQL
Success: 33300000 replacements to be made.

real	1m23.315s
user	0m0.775s
sys	0m0.698s

@MarkBerube MarkBerube requested a review from a team as a code owner January 15, 2026 17:47
@github-actions
Copy link

Hello! 👋

Thanks for opening this pull request! Please check out our contributing guidelines. We appreciate you taking the initiative to contribute to this project.

Contributing isn't limited to just code. We encourage you to contribute in the way that best fits your abilities, by writing tutorials, giving a demo at your local meetup, helping other users with their support questions, or revising our documentation.

Here are some useful Composer commands to get you started:

  • composer install: Install dependencies.
  • composer test: Run the full test suite.
  • composer phpcs: Check for code style violations.
  • composer phpcbf: Automatically fix code style violations.
  • composer phpunit: Run unit tests.
  • composer behat: Run behavior-driven tests.

To run a single Behat test, you can use the following command:

# Run all tests in a single file
composer behat features/some-feature.feature

# Run only a specific scenario (where 123 is the line number of the "Scenario:" title)
composer behat features/some-feature.feature:123

You can find a list of all available Behat steps in our handbook.

@gemini-code-assist
Copy link

Summary of Changes

Hello @MarkBerube, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces intelligent URL transformation capabilities to the wp search-replace command, aiming to significantly boost performance for URL replacements in WordPress databases. By smartly identifying and skipping columns that are guaranteed not to contain URLs, either through a static list of core WordPress columns or dynamic database analysis, this enhancement makes environment cloning and domain migration processes faster and more efficient, especially for large installations.

Highlights

  • New --smart-url Flag: Introduces a --smart-url flag to the wp search-replace command, which automatically skips over 75 WordPress core columns known not to contain URLs. This significantly improves performance for URL replacement operations, with reported gains of 30-40% on large databases.
  • New --analyze-tables Flag: Adds an --analyze-tables flag, which can be used in conjunction with --smart-url. This flag enables advanced table analysis, dynamically identifying and skipping non-text columns (like integers, dates, enums) and columns matching common non-URL naming patterns (e.g., *_id, *_status) in custom plugin tables, further optimizing the search-replace process.
  • Automatic URL Replacement Detection: The wp search-replace command now automatically detects if the search string is a URL (starting with http:// or https://) and enables --smart-url mode by default, streamlining the user experience for common URL migration tasks.
  • Validation for --smart-url Usage: Includes validation to ensure that the --smart-url flag is only used when the search string is a valid URL, preventing incorrect usage and providing helpful error messages.
  • Comprehensive Test Coverage and Documentation: Extensive new feature tests have been added to cover various scenarios for --smart-url and --analyze-tables, including different data types, serialized data, multisite setups, and error conditions. The README.md has also been updated with detailed explanations and examples for the new flags.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable enhancement to the search-replace command by adding smart URL transformation support. The new --smart-url and --analyze-tables flags, along with automatic URL detection, significantly improve performance for URL replacements by intelligently skipping non-URL columns. The implementation is robust, covering static core column lists, dynamic datatype analysis, and pattern matching for column names. Comprehensive test cases have been added to ensure the new functionality works as expected across various scenarios, including error handling for invalid URLs. The documentation has also been updated to reflect these new options and their usage.

@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

);

if ( empty( $columns ) ) {
continue; // @codeCoverageIgnore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain the ignore - this is a extremely rare edge case where the wpdb reports no columns for a DB table. This makes this feature effectively not usable so we must skip the operation. The ignore makes more sense to me than mocking a complex DB edge case in the .feature that should never be hit to begin with.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds smart URL transformation support to the search-replace command to improve performance when replacing URLs in WordPress databases. The feature automatically skips columns that cannot contain URLs, resulting in ~34-42% performance improvement on large databases.

Changes:

  • Adds --smart-url flag with auto-detection for URLs starting with http:// or https://
  • Adds --analyze-tables flag for advanced MySQL datatype analysis to skip additional non-text columns
  • Implements comprehensive test coverage with 42 new test scenarios

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/WP_CLI/SearchReplace/Non_URL_Columns.php New class providing static list of WordPress core non-URL columns and dynamic analysis methods for identifying non-text datatypes and naming patterns
src/Search_Replace_Command.php Adds URL auto-detection, smart-url mode implementation, table analysis, and validation logic for the new flags
features/search-replace.feature Updates existing test scenarios to expect "Detected URL replacement" message and adjusted output format
features/search-replace-url.feature New comprehensive test file with 42 scenarios covering smart-url and analyze-tables functionality
README.md Documents the new --smart-url and --analyze-tables flags with usage examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +104 to +105
'to_ping',
'pinged',
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The columns to_ping and pinged in wp_posts can contain URLs. According to WordPress Codex, to_ping stores a list of URLs to ping when the post is published, and pinged stores URLs that have already been pinged. These columns should be removed from the skip list as they are specifically designed to contain URLs.

Suggested change
'to_ping',
'pinged',

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, these cols are going to contain URLs, this is true. However they aren't going contain URLs we would like to replace on. We pingback to other WP sites(not the one we are on) so if we're cloning a DB using search-replace to replace the URLs in these columns aren't going to be for the current site.

'comment_id',

// wp_users table - User metadata and status
'user_login',
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user_login column could theoretically contain a URL-like string if someone uses an email address or URL-formatted username. While uncommon, excluding this from search-replace in URL mode could miss legitimate use cases. Consider whether this exclusion is too aggressive.

Suggested change
'user_login',

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if user admin at admin@blah.com search and replaces blah.com to foobar.com they are expected to know their login name has changed? Can't say I agree.

'link_rating',
'link_updated',
'link_rel',
'link_rss',
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link_rss column in wp_links is specifically designed to store RSS feed URLs. This should not be in the skip list as it can contain URLs. The link_rel column typically contains relationship values like 'nofollow' rather than URLs, so that one is correctly excluded.

Suggested change
'link_rss',

Copilot uses AI. Check for mistakes.
Comment on lines +311 to +319
if ( ! filter_var( $old, FILTER_VALIDATE_URL ) ) {
WP_CLI::error(
sprintf(
'The --smart-url flag is designed for URL replacements, but "%s" is not a valid URL. ' .
'Please use a full URL (e.g., http://example.com) or remove the --smart-url flag.',
$old
)
);
}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PHP's FILTER_VALIDATE_URL accepts URLs without schemes (e.g., 'example.com'), but the auto-detection only triggers for URLs starting with 'http://' or 'https://'. This creates an inconsistency where the validation would pass for 'example.com' if someone manually uses --smart-url, but it wouldn't auto-detect. The validation should either require a scheme (http:// or https://) or accept URLs without schemes consistently.

Copilot uses AI. Check for mistakes.
that cannot contain URLs (like post_type, post_status, user_pass, etc.),
significantly improving performance for URL replacements. This is
particularly useful when migrating sites or changing domain names.
Performance: ~34% faster on large databases.
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description shows test results indicating 41.6% performance improvement, but the README states ~34%. While both might be valid depending on the database, it would be better to use consistent numbers or provide a range (e.g., '30-40% faster' or '~34-42% faster') to set accurate expectations.

Suggested change
Performance: ~34% faster on large databases.
Performance: ~34–42% faster on large databases (depending on the database).

Copilot uses AI. Check for mistakes.
Comment on lines +320 to +323
| replacement | flags |
| {SITEURL}/subdir | |
| newdomain.com | |
| newdomain.com | --dry-run |
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test uses 'newdomain.com' as a replacement value, which is not a valid URL (missing scheme). This will cause FILTER_VALIDATE_URL to fail in the validation logic at line 311 of Search_Replace_Command.php when smart-url mode is auto-enabled. The test should use 'http://newdomain.com' or 'https://newdomain.com' instead.

Suggested change
| replacement | flags |
| {SITEURL}/subdir | |
| newdomain.com | |
| newdomain.com | --dry-run |
| replacement | flags |
| {SITEURL}/subdir | |
| http://newdomain.com | |
| http://newdomain.com | --dry-run |

Copilot uses AI. Check for mistakes.
@swissspidy
Copy link
Member

Thanks a lot! I think this goes in the right direction.

This is very much related to #186. A separate command might be interesting for this as it allows for future optimizations more easily, such as ones mentioned at https://make.wordpress.org/core/2025/11/27/wordpress-importer-can-now-migrate-urls-in-your-content/ (cc @adamziel)

@mrsdizzie
Copy link
Member

At a glance, I feel like the real feature here is an option that says "ignore all non text columns" and it feels like it would be useful beyond URLs (though URLs is probably the most popular use case for this). Id be curious of a breakdown of performance and if there is a lot to gain by adding anything beyond the single feature that ignores based on column types.

Feels like a lot of the hardcoded table names are already of a type that would be ignored anyway.

The regex pattern for potential column names seems fragile and prone to false positives (and again, the positives are already likely to have a type that would be ignored).

In other words, can we get most of the benefits here with just the single feature to ignore NON_TEXT_DATA_TYPES (which benefits all search-replace, not just when its a url) ? And have that as a --text-only flag or something like that?

Maybe another command could build on top of that, but to me that seems like a small general change that could maybe have a big impact if a lot of time is spent on those types of columns.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@MarkBerube
Copy link
Contributor Author

MarkBerube commented Jan 16, 2026

@mrsdrizzle thank you for posting your thoughts here.

That is how search-replace already operates via is_text_col here: https://github.com/wp-cli/search-replace-command/blob/main/src/Search_Replace_Command.php#L816-L851

I agree that these skips can be quite aggressive and fragile as well - and that's why I exactly limited it to URLs only and made the regex and opt-in only thing via --analyze-tables. As for your curiosity:

Take for example the core table _posts. _posts in my experience with larger sites ends up scaling to be one of the biggest tables you can get. There are 23 total columns in this table. Out of 9 of those columns end up being non-text types:

Column	Data Type	Category
ID	bigint(20) unsigned	Integer
post_author	bigint(20) unsigned	Integer
post_date	datetime	DateTime
post_date_gmt	datetime	DateTime
post_modified	datetime	DateTime
post_modified_gmt	datetime	DateTime
post_parent	bigint(20) unsigned	Integer
menu_order	int(11)	Integer
comment_count	bigint(20)	Integer

There are another 9, nearly half, that I can say without doubt aren't worth scanning over for URLs but are typed as text columns:

post_status	varchar(20)
comment_status	varchar(20)
ping_status	varchar(20)
post_password	varchar(255)
post_name	varchar(200)
guid	varchar(255)
post_type	varchar(20)
post_mime_type	varchar(100)

Now I'm obviously picking out one of the worst offenders in the WP DB to prove my point here, but we really wouldn't be discussing this if the core WP DB had better typing in general.

Here's my benchmarks from scratch with a bit more data like you requested. Now this isn't exactly equivalent to the 100+ GB real world instances nor is the data distribution probably accurate, but is enough to show how much we're saving query overhead on.

Database Size: ~15.72GB
Total Rows: ~30,238,733

Final Results (3-run average)

Version Run 1 Run 2 Run 3 Average
Baseline 144.44s 145.67s 147.37s 145.83s
--smart-url 96.94s 95.21s 95.84s 96.00s

Summary

Version Average Time Columns Skipped Columns Processed Improvement
Baseline 145.83s 42 non-text 52 text/varchar -
--smart-url 96.00s 68 total 26 34% faster

@MarkBerube
Copy link
Contributor Author

@swissspidy I am not opposed to moving this to a separate command(although I don't know where to start on that), but this change will probably stay under the radar of folks that use search-replace with URLs on a daily basis for sometime. We have been working with the same search-replace command for over a decade after all. By making it only activate on URL based scan strings or by choice of WP CLI parameters I feel like I've alleviated the risk of keeping it here but if that's not the case could we talk about the process of moving it?

@mrsdizzie
Copy link
Member

mrsdizzie commented Jan 16, 2026

Interesting, I supposed while technically being a type that holds text I would also consider varchar something to be avoided and focus specifically on just the TEXT/MEDIUMTEXT/LONGTEXT types (I understood NON_TEXT_DATA_TYPES a bit too literally there).

I guess a better version of my question then is "Do you get most of these savings by only looking at TEXT/MEDIUMTEXT/LONGTEXT types -- ignoring anything else including varchar?" That could perhaps be tested by just removing varchar from the is_text_col function (though Im not 100% sure on that or all the details on how it works now).

For example, in your initial post when I look at the tables/columns that have been removed from the current output and the "Smart URL Mode Output" those are almost all just varchar right? and you'd presumably get the same savings by just allowing to skip varchar type as well?

This is quick/by hand (sorry for any errors) base on details here, but these seem to be the differences between the current output and your modified example -- pretty much all of which would also get ignored by just skipping varchar

Table Column Data Type
wp_commentmeta meta_key VARCHAR(255)
wp_comments comment_approved VARCHAR(20)
wp_comments comment_type VARCHAR(20)
wp_links link_visible VARCHAR(20)
wp_links link_rel VARCHAR(255)
wp_links link_rss VARCHAR(255)
wp_options option_name VARCHAR(191)
wp_options autoload VARCHAR(20)
wp_postmeta meta_key VARCHAR(255)
wp_posts post_status VARCHAR(20)
wp_posts comment_status VARCHAR(20)
wp_posts ping_status VARCHAR(20)
wp_posts post_password VARCHAR(255)
wp_posts post_name VARCHAR(200)
wp_posts to_ping TEXT
wp_posts pinged TEXT
wp_posts post_type VARCHAR(20)
wp_posts post_mime_type VARCHAR(100)
wp_term_taxonomy taxonomy VARCHAR(32)
wp_termmeta meta_key VARCHAR(255)
wp_terms slug VARCHAR(200)
wp_usermeta meta_key VARCHAR(255)
wp_users user_login VARCHAR(60)
wp_users user_nicename VARCHAR(50)
wp_users user_email VARCHAR(100)
wp_users display_name VARCHAR(250)

So you'd get mostly the same results I assume.

@MarkBerube
Copy link
Contributor Author

no worries, let's see how it goes, mainline code with is_text_col like so:
foreach ( array( 'text' ) as $token ) { so this means things like longtext will be skipped but not varchar:

Final Results (3-run average)

Version Run 1 Run 2 Run 3 Average Columns
Main Branch 151.83s 151.93s 152.14s 151.97s 52
TEXT-ONLY (skip varchar) 93.99s 93.52s 93.35s 93.62s 15
--smart-url 96.94s 95.21s 95.84s 96.00s 26

Summary

Version Time vs Main
Main Branch 151.97s -
TEXT-ONLY (skip varchar) 93.62s 38.4% faster
--smart-url 96.00s 36.8% faster

Your guess is right - but it does skip these columns that exist that need to be processed for URLs:

comment_author_url varchar(200) NOT NULL default '',
user_url varchar(100) NOT NULL default '',
link_url varchar(255) NOT NULL default '',
link_image varchar(255) NOT NULL default '',

if the DB typing was a bit more strict that would be a great simplification of our problem here, but that's not the case in our current state of the WP DB.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@mrsdizzie
Copy link
Member

hmm yea, though that wp_links table is not really used and perhaps will go away at some point:
https://core.trac.wordpress.org/ticket/40088

but annoying for the other two! (and thank you for taking the time to run those).

Interesting to see where most of the savings comes from though, and still feels worth a feature to be able to target column types based on this.

I don't think I really agree with the URL specific features here though or how it is designed. I've probably done a thousand of these type of change the URL for the site and I never use a string that starts with http/https because there are many ways that a domain name/"url" get stored in the database that don't look like that. lots of just //example.com ://example.com https:\/\/ etc... that are used to construct URLs that need to be fixed.

Also agree that this should be a separate command if it can't be something more generic to general search-replace (like filtering out column types). I don't actually believe its possible to replace a url in a real world meaningful way with a single search-replace command

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce a dedicated search-replace url command

3 participants