Add Instagram data liberation support#4
Closed
edequalsawesome wants to merge 1 commit into
Closed
Conversation
Adds a complete Instagram-to-WordPress migration pipeline using the same CDP-based approach as Wix and Squarespace extractors. Scripts: - scripts/instagram/discover.js: Scroll-based GraphQL interception to inventory all posts with metadata, captions, timestamps, locations - scripts/instagram/extract.js: Per-post extraction with ?img_index=N for carousel slides, deduplication by Instagram media ID - scripts/instagram/import.js: XML-RPC import with wp.uploadFile for media, wp.newPost with featured images, gallery blocks for carousels, correct backdated post dates, and source links to original posts Also includes: - prompts/instagram.md: User-facing migration prompt - tests/instagram.test.js: 32 unit tests covering data transformation, XML-RPC encoding, carousel deduplication, and content generation - DISCOVERIES.md entry documenting key findings - Updated AGENTS.md, README.md, cli.js for Instagram support Tested against a 308-post profile: 364 media files uploaded, all posts imported with correct dates, unique carousel slides in gallery blocks, and featured images. Zero failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
borkweb
added a commit
that referenced
this pull request
Apr 12, 2026
Reworks the Instagram support originally added in #4 (by @edequalsawesome) on top of the new adapter/lib architecture introduced in #6. - src/adapters/instagram.ts implementing PlatformAdapter: - discover() scrolls the profile via CDP and intercepts GraphQL responses - extract() visits each post via CDP, walks ?img_index=N for carousel slides (deduped by Instagram media ID), downloads media, and emits wp:image / wp:gallery / wp:video Gutenberg blocks - hashtags become WP tags; @mentions and #hashtags in captions are linked - Registered in src/mcp-server.ts and detect-platform URL patterns - Removed the legacy scripts/instagram/ scripts and tests/instagram.test.js, which targeted the pre-restructure standalone-script entry points - Added test/adapters/instagram.test.ts with a fixture-driven dry-run that exercises the WXR pipeline end to end - Refreshed prompts/instagram.md for the new npm run liberate workflow - Updated README, AGENTS.md, CONTRIBUTING.md, docs/mcp.md to mention Instagram
borkweb
added a commit
that referenced
this pull request
Apr 12, 2026
…ame parsing - cli.js: revert PR #4's Instagram branch — it referenced scripts/instagram/ scripts that no longer exist after restructuring. The legacy CLI is for legacy script entry points only; Instagram is reachable via the adapter. - src/adapters/instagram.ts: parseInstagramUsername now strips path segments from non-URL inputs so 'foo/bar' cannot be smuggled into a navigation URL. - DISCOVERIES.md: rewrite the 'How it works' paragraph for the adapter pipeline instead of the deleted scripts/instagram/ pipeline.
3 tasks
Member
|
Awesome work! I've taken this and adapted it to the new approach/structure in #18 🕺 Closing this in favor of that other PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this changes
Adds a complete Instagram-to-WordPress migration pipeline — discover, extract, and import scripts following the same CDP-based pattern as the existing Wix and Squarespace extractors.
Key features
?img_index=NURL parameter with media ID deduplicationwp.uploadFilefor media andwp.newPostfor posts — WordPress.com's REST API doesn't support writes with application passwordswp:imagefor photosFiles added
scripts/instagram/discover.js— profile inventory via GraphQL interceptionscripts/instagram/extract.js— per-post extraction with carousel supportscripts/instagram/import.js— XML-RPC WordPress.com importprompts/instagram.md— user-facing migration prompttests/instagram.test.js— 32 unit testsAGENTS.md,README.md,DISCOVERIES.md,cli.jsHow I found it
Built and tested against a real 308-post Instagram profile over two days. Key discoveries documented in DISCOVERIES.md:
?img_index=Nfor direct carousel slide access (no click-through needed)<li>elements in carousel DOM (previous/current/next) — dedup by media ID<dateTime.iso8601>forpost_date— must send as plain stringTested against
Discovery log entry added to DISCOVERIES.md
🤖 Generated with Claude Code