Skip to content

Add Substack migration support#1

Open
joeboydston wants to merge 1 commit into
Automattic:mainfrom
joeboydston:improvement/substack-migration
Open

Add Substack migration support#1
joeboydston wants to merge 1 commit into
Automattic:mainfrom
joeboydston:improvement/substack-migration

Conversation

@joeboydston

Copy link
Copy Markdown

Summary

  • New scripts/substack/discover.js — inventories a Substack publication via its undocumented public API (/api/v1/archive). No browser dependency.
  • New scripts/substack/extract.js — extracts all content via API, with --csv-export flag for full paid post content from Substack's official export. Unwraps substackcdn.com CDN wrapper URLs to download original images.
  • New prompts/substack.md — user-facing migration prompt covering content, subscribers, paid tiers, and redirects.
  • Updated scripts/import.js — adds --self-hosted flag for non-WordPress.com sites, fixes media upload Content-Type handling, improves image URL replacement to handle Substack CDN wrapper URLs. Wix import behavior unchanged.
  • Updated AGENTS.md, README.md, DISCOVERIES.md with Substack documentation.

How I found it

Built and tested as a new platform contribution. Substack's public API endpoints (/api/v1/archive, /api/v1/posts/<slug>) return rich JSON without authentication, making extraction fast and browser-free.

Tested against

  • Real Substack publication (public, all-free content, 265 posts)
  • Scripts run without errors
  • Full end-to-end import to a WordPress staging site
  • Images correctly replaced from Substack CDN URLs to WordPress media URLs
  • Posts created as drafts with correct titles, dates, and content

Discovery log entry added to DISCOVERIES.md

  • Yes

🤖 Generated with Claude Code

New platform extractor for Substack publications. Uses Substack's
undocumented public API for content discovery and extraction — no
browser needed. Supports dual-source extraction: API for metadata
and free content, CSV export for full paid post content. Images are
unwrapped from Substack's CDN proxy URLs to download originals.

Also adds --self-hosted flag to import.js for non-WordPress.com sites,
fixes media upload Content-Type handling, and improves image URL
replacement to handle CDN wrapper URLs.

Tested end-to-end against coloradomedia.substack.com → Atomic staging site.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@simison simison left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool idea to add Substack!

Comment thread AGENTS.md
| URL structure (`/p/slug`) | Generate redirect map; set WordPress permalinks to match if possible |
| Subtitles not in WordPress | Add as styled first paragraph or use a subtitle plugin |
| Subscriber migration | Separate CSV export; requires email service setup |
| Paid subscriber migration | Stripe account transfer needed; no automated path |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have APIs for doing both paid and free subscriber imports to WP.com and Jetpack sites, including the Stripe account switch, so those should first be exposed via MCP and then connected here.

Cc @Automattic/loop team who's looking after subscriber importer.

Comment thread AGENTS.md

### Substack

| Problem | Solution |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some potential problems to solve, not necessarily in this PR:

  • Videos and podcasts are not included in substack exports and need to be scraped separately. For paid videos needs authenticating or maybe access to sites media management?
  • Lots of "subscribe!" nudges in post content; could be removed, or replaced e.g. with MailPoet forms or Jetpack Subscription blocks. I'd imagine intuitively AI would use button or link blocks otherwise but there's no point if they aren't actually functional.
  • Paywall markers in paid posts likely needs guidance which plugin/block to use; Jetpack Newsletters has a paywall block!

@robertbpugh

robertbpugh commented Apr 3, 2026

Copy link
Copy Markdown

Nice work. We've done a lot of Substack migrations on the WP.com side -- a few things that always come up:

  • Paid subscriber imports are tough. Stripe is the source of truth, not the Substack export. Comp/gift subs don't come through at all since they're not in Stripe.
  • Subscription dates get lost, everyone shows the import date, not their original sub date.
  • There are REST endpoints for both free and paid subscriber import, including Stripe plan mapping. We're working on subscription date preservation now, and wrapping these for MCP would close the biggest gap, letting the agent handle full subscriber migration end-to-end without manual CSV steps.

borkweb added a commit that referenced this pull request Jun 5, 2026
# Conflicts:
#	src/mcp-server/handlers/reconstruct-pages.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants