Skip to content

Add Instagram data liberation support#4

Closed
edequalsawesome wants to merge 1 commit into
Automattic:mainfrom
edequalsawesome:add/instagram-liberation-clean
Closed

Add Instagram data liberation support#4
edequalsawesome wants to merge 1 commit into
Automattic:mainfrom
edequalsawesome:add/instagram-liberation-clean

Conversation

@edequalsawesome

Copy link
Copy Markdown

What this changes

Adds a complete Instagram-to-WordPress migration pipeline — discover, extract, and import scripts following the same CDP-based pattern as the existing Wix and Squarespace extractors.

Key features

  • Profile discovery via GraphQL response interception during scroll pagination
  • Carousel slide extraction using ?img_index=N URL parameter with media ID deduplication
  • XML-RPC import with wp.uploadFile for media and wp.newPost for posts — WordPress.com's REST API doesn't support writes with application passwords
  • Gallery blocks for carousel posts, single wp:image for photos
  • Featured images set from first media item
  • Correct post dates backdated to original Instagram timestamps
  • Source links back to original Instagram posts

Files added

  • scripts/instagram/discover.js — profile inventory via GraphQL interception
  • scripts/instagram/extract.js — per-post extraction with carousel support
  • scripts/instagram/import.js — XML-RPC WordPress.com import
  • prompts/instagram.md — user-facing migration prompt
  • tests/instagram.test.js — 32 unit tests
  • Updated AGENTS.md, README.md, DISCOVERIES.md, cli.js

How I found it

Built and tested against a real 308-post Instagram profile over two days. Key discoveries documented in DISCOVERIES.md:

  • ?img_index=N for direct carousel slide access (no click-through needed)
  • Instagram keeps 3 <li> elements in carousel DOM (previous/current/next) — dedup by media ID
  • WordPress.com ignores <dateTime.iso8601> for post_date — must send as plain string
  • WordPress.com REST API rejects writes with app passwords — XML-RPC works

Tested against

  • Real Instagram profile (308 posts, 21 carousels, 6 videos)
  • Full extraction: 308 posts, 364 media files, zero failures
  • Full import to WordPress.com: all media uploaded, all posts created with correct dates
  • Carousel slides verified unique (zero duplicates across all 21 carousels)
  • 32 unit tests passing
  • Scripts parse without errors

Discovery log entry added to DISCOVERIES.md

  • Yes

🤖 Generated with Claude Code

Adds a complete Instagram-to-WordPress migration pipeline using the same
CDP-based approach as Wix and Squarespace extractors.

Scripts:
- scripts/instagram/discover.js: Scroll-based GraphQL interception to
  inventory all posts with metadata, captions, timestamps, locations
- scripts/instagram/extract.js: Per-post extraction with ?img_index=N
  for carousel slides, deduplication by Instagram media ID
- scripts/instagram/import.js: XML-RPC import with wp.uploadFile for
  media, wp.newPost with featured images, gallery blocks for carousels,
  correct backdated post dates, and source links to original posts

Also includes:
- prompts/instagram.md: User-facing migration prompt
- tests/instagram.test.js: 32 unit tests covering data transformation,
  XML-RPC encoding, carousel deduplication, and content generation
- DISCOVERIES.md entry documenting key findings
- Updated AGENTS.md, README.md, cli.js for Instagram support

Tested against a 308-post profile: 364 media files uploaded, all posts
imported with correct dates, unique carousel slides in gallery blocks,
and featured images. Zero failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
borkweb added a commit that referenced this pull request Apr 12, 2026
Reworks the Instagram support originally added in #4 (by @edequalsawesome)
on top of the new adapter/lib architecture introduced in #6.

- src/adapters/instagram.ts implementing PlatformAdapter:
  - discover() scrolls the profile via CDP and intercepts GraphQL responses
  - extract() visits each post via CDP, walks ?img_index=N for carousel
    slides (deduped by Instagram media ID), downloads media, and emits
    wp:image / wp:gallery / wp:video Gutenberg blocks
  - hashtags become WP tags; @mentions and #hashtags in captions are linked
- Registered in src/mcp-server.ts and detect-platform URL patterns
- Removed the legacy scripts/instagram/ scripts and tests/instagram.test.js,
  which targeted the pre-restructure standalone-script entry points
- Added test/adapters/instagram.test.ts with a fixture-driven dry-run that
  exercises the WXR pipeline end to end
- Refreshed prompts/instagram.md for the new npm run liberate workflow
- Updated README, AGENTS.md, CONTRIBUTING.md, docs/mcp.md to mention Instagram
borkweb added a commit that referenced this pull request Apr 12, 2026
…ame parsing

- cli.js: revert PR #4's Instagram branch — it referenced scripts/instagram/
  scripts that no longer exist after restructuring. The legacy CLI is for
  legacy script entry points only; Instagram is reachable via the adapter.
- src/adapters/instagram.ts: parseInstagramUsername now strips path segments
  from non-URL inputs so 'foo/bar' cannot be smuggled into a navigation URL.
- DISCOVERIES.md: rewrite the 'How it works' paragraph for the adapter
  pipeline instead of the deleted scripts/instagram/ pipeline.
@borkweb

borkweb commented Apr 12, 2026

Copy link
Copy Markdown
Member

Awesome work! I've taken this and adapted it to the new approach/structure in #18 🕺 Closing this in favor of that other PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants