feat(replicate): adopt Neptune techniques, SVG survival, forms-to-Jetpack#79
Merged
Conversation
…pack ## Summary Adopts seven techniques surveyed from a8cteam51/neptune into the blocks path and shared compare layer, then layers on SVG media survival and native Jetpack form reconstruction, plus five review-driven fast-follows. 76 files, +5473/−89, 2256 tests (baseline 2087). ## Why Three gaps drove this: parity scores were silently untrustworthy when replica captures came up short (lazy-load/admin-bar artifacts), refine loops had no audit trail and burned agent spend on already-matching pages, and two whole content classes died in reconstruction — SVG logos (rejected or broken-inserted by default WP) and forms (dropped to dead core/html islands). ## How - Compare v2: heightMismatchRatio + magenta-padded .padded.png diff per viewport, fullPageScore over the full common canvas (closes the top-viewport blind spot), parity-gate constants; crop score semantics frozen for historical comparability. - Refine coverage: match-section now works diagnose→fix→account with per-section refine reports; liberate_refine_report enforces every finding lands in applied[] or skipped[]; builder envelope gained prose-recovery parsing, unknown-key rejection, variation dead-write and inventory-redeclare guards. - Styling cascade: shared styling-priority.md reference (preset → patch → instance → variation → layout → CSS) wired into four skills. - Variation hoisting: recurring instance-style constellations dedupe into versioned styles/blocks/lib-*.json theme variations (default on, fail-open, gated on block-fixer readiness, jetpack/* excluded); block-fixer recovery now preserves author comment attrs through invalid-block recovery and deprecation migrations. - Asset triage: deterministic decorative-asset candidates + vision-verdict asset-triage.json consumed pre-build; removals recorded in fallback-diagnostics with structural-replacement hints. - SVG survival: fetched SVGs gain a rasterized PNG sibling + use/defs risk scan; install auto-installs safe-svg via the new idempotent ensurePlugin helper and routes risky/failed SVGs to the PNG (previously SVGs were inserted broken as octet-stream). - Forms→Jetpack: per-section form capture (SectionSpec.forms, schema v8) emits jetpack/contact-form + field blocks with a core/button submit (jetpack/button is connection-gated and unregistered locally); Jetpack auto-installs with the contact-form module activated when forms are detected; proportional field widths from captured geometry. - Reinstall seam: pipeline-emitted coverage islands carry a metadata marker so theme reinstall passes validation while hand-authored wp:html stays banned; WooCommerce auto-installs on the carry path when products are detected. ## Testing - [ ] npm test (2256 passing) and npx tsc --noEmit clean - [ ] Dogfood gates: swiftlumber A/B hoist-inertia (0.000000 delta), corneliusholmes fresh-site (29 variations/355 instances, triage end-to-end), forms live-render on corneliusholmes-neptune (labels, submit text, Phone field, module auto-activation verified at :8890)
…n leaks ## Summary Two generic capture fixes for the blocks reconstruct path: on a spec-cache miss, liberate_reconstruct_pages now walks the SAVED settled html/<slug>.html snapshot before falling back to a live re-navigation, and the extractFull section walk excludes chrome-descendant and aria-hidden body-section candidates. ## Why A SECTION_SPECS_SCHEMA bump invalidates every cached spec, which previously routed all pages through extractFullFromUrl — a weaker live capture (1s settle, headless, no adapter capture seam) executed days after the screenshot phase. On the getsnooz dogfood this silently corrupted every page: the homepage hero (a Replo carousel) was caught in a different slide state and lost, and Shopify Dawn's mega-menu dropdown — hidden via visibility/opacity, so offsetParent stays non-null and isVisible() passes — won a Y-band on every page and rendered a junk product strip atop all 29 reconstructions. ## How extractFullFromSavedHtml strips scripts (same policy as the segmentation fixture harness), route-fulfills the document at its ORIGINAL url so baseURI resolves source-relative references, lets subresources load so computed styles are real, and runs the standard extractFull walk. The handler prefers it on cache miss (new specsFromSavedHtml tally) and keeps the live path as last resort, so specs stay coherent with the screenshots and saved HTML from the same capture. The walk now drops candidates inside header/footer/nav landmarks or [aria-hidden=true] subtrees in all three collectors (semantic, band, tile); the landmark elements themselves stay eligible since stripChrome and the landmark census own them downstream. ## Testing - [ ] npx vitest run src/lib/replicate/section-extract-chrome.test.ts (mega-menu leak, aria-hidden drawer, saved-HTML replay) - [ ] Re-run liberate_reconstruct_pages on a captured site after deleting sections/*.json — result reports specsFromSavedHtml=N, specsFromLive=0, and no header-menu band in the reconstructed pages
## Summary
Six generic fidelity fixes on the blocks reconstruct path, found dogfooding swiftlumber: form sections now keep their photo and emit the live Jetpack form instead of islanding, multi-row source galleries render as wrapping cropped grids instead of scroller strips, span-wrapped semantic headings (Wix rich-text eyebrows) are finally captured, promoted-heading body echoes stop duplicating headlines, recoverable dropped images are appended instead of demoting sections to islands, and liberate_reconstruct_pages backs up every page's pre-update post_content.
## Why
The forms-to-Jetpack feature never fired on the page it was built for: the capture walk re-captures a form's own field labels as content cells ("First name" / body "*"), which routed the section to the cell grid, dropped the section photo, and the coverage island then discarded the jetpack/contact-form emit. Three more source-fidelity gaps surfaced on the same run: galleryBlock unconditionally emitted the horizontal scroller so a wrapping source grid (projects, 25 photos over ~7 rows) hid most of its images; the heading collector required a direct text-node child, so Wix rich-text headings (`<h1><span>GALLERY</span></h1>`) below the 28px styled floor vanished from every capture path; and a styled-`<p>` headline captured as both heading and body rendered twice. Separately, a full re-run clobbered operator-accepted post_content with no recovery path, and sections that lost only a recoverable image were demoted to non-editable islands.
## How
page-reconstruct.ts: budgeted field-label-echo cell suppression (mirrors the existing submit suppression; a cell sharing a field label but carrying real content survives); galleryBlock defers to source geometry (section height >= 2x the computed row height -> wrapping is-cropped grid, single row -> scroller, no height -> scroller for back-compat); body echoes drop only when the section's source HTML shows the text once (genuine heading+paragraph duplicates are kept); the coverage gate appends recoverable dropped images (local, above the decorative floor) as image blocks and re-measures before falling back to an island, and renderCellGrid renders unclaimed section-level images as their own grid columns. section-extract.ts: semantic h1-h6 qualify by textContent with a containment guard against double-capturing a big styled span inside an already-captured heading. reconstruct-pages handler: each page's pre-update post_content is saved to <outputDir>/.post-content-prev/<runstamp>/<slug>.html (postContentBackups/postContentBackupDir in the result) so an accepted state is always restorable.
## Testing
- [ ] npx vitest run src/lib/replicate/ src/mcp-server/ (957 tests green; new coverage in page-reconstruct.test.ts and section-extract-chrome.test.ts)
- [ ] E2E: delete sections/projects.json on a captured Wix site, re-run liberate_reconstruct_pages for that page — result reports specsFromSavedHtml=1, postContentBackups=1, and the page renders the eyebrow + headline once + a wrapping cropped grid
- [ ] On a form-bearing page, the section renders structured with a live jetpack/contact-form and its photo (no coverage island)
## Summary measureSectionCoverage now matches captured text against the markup's decoded text content (cheerio textContent + the converted-path glyph folding) instead of a raw substring scan of the block markup. Image URL matching is unchanged. ## Why The structured renderers emit text through escapeHtml, so the markup carries &/' where the captured text has &/'. The raw substring match read every such text as missing — a section whose texts ALL contained an escapable character measured 0% coverage and was demoted to a core/html island the render never warranted (corneliusholmes dogfood: "Pets & Our Mental Health", "World Men's Day", "Payment & Insurance" all islanded with their text fully rendered). It also silently understated coverage on every mixed section. ## How Build the haystack from cheerio's decoded textContent of the rendered markup (entities decoded — including the block-fixer's ’ canonicalization — block comments dropped) and fold both sides with the same foldText used by measureConvertedCoverage, so the structured and converted coverage measures agree. Removed the now-unused normalize helper. ## Testing - [ ] npx vitest run src/lib/replicate/section-coverage.test.ts (3 new cases: &-escape, apostrophe-escape, curly-vs-straight glyph fold) - [ ] Dogfood: corneliusholmes blocks reconstruct goes from 4 text-floor islands to 0 (htmlFallbackByReason: {})
…, and gates ## Summary Eleven review-driven fixes from the pre-landing pass on the Neptune branch: one content-loss bug (promoted-heading echo drop read entity-encoded source HTML as raw text), four hoist-safety guards, the SVG dedup raster carry, a parser work budget, a handler path guard, and three doc corrections. ## Why The adversarial review found the echo-drop check comparing decoded DOM captures against entity-encoded sectionHtml — any heading-matching paragraph containing &/</> counted 0 occurrences and was dropped, including genuine twice-rendered duplicates (the same bug class 0d35541 fixed in the coverage gate, reintroduced one layer up). The remaining findings were latent: comment-delimiter corruption paths in the hoist rewrite, an editor-invalid desync in hoisted pattern files, a wrong-image substitution edge for deduped SVGs, an O(n²) stall in envelope recovery, and a parity-gate rule that made gate-skips unreachable for benignly height-mismatched pages. ## How - page-reconstruct: echo-drop haystack now decodes via cheerio + foldText (shared from section-coverage); occurrence count 0 means can't-verify → keep, never drop. Two pinning tests. - variation-hoist: attr rewrites re-serialize through serializeBlockAttrs (exported from form-blocks) so decoded unicode-escaped `--` can't terminate the block comment; derived slugs pass a lib-[a-z0-9-] charset guard before becoming theme filenames. - reconstruct-pages: pattern files are written PRE-hoist — they never pass block-fixer canonicalization, so a comment-attr-only swap left them permanently editor-invalid on reinstall; applyHoistSwaps stays for when pattern canonicalization lands. - media-fetch: byte-identical deduped SVGs inherit the original download's rasterPath/svgRisky, closing the basename-collision edge where install-time derivation substituted an unrelated PNG. - builder-envelope: recoverJsonObject gains a total-work budget — the 500k length cap bounded n, not n². - refine-report handler: slug validated as a plain path segment (no separators/traversal). - match-page parity gate: fullPageScore gates only when heights match; magenta padding counts as diff by construction, so any sub-threshold height delta capped it below 0.995 and defeated the gate. - Docs: tool counts corrected (README 34→35, AGENTS.md 30→35); nonexistent --no-variation-hoist CLI flag removed from AGENTS.md. ## Testing - [ ] npx tsc --noEmit — clean - [ ] SKIP_BROWSER_TESTS=1 npx vitest run over the changed areas — 1679 passing - [ ] scripts/block-fixer: node --test — 11 passing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adopts seven techniques surveyed from a8cteam51/neptune into the blocks path and shared compare layer, plus SVG media survival, native Jetpack form reconstruction, and five review-driven fast-follows. Squashed from 50 reviewed commits.
What's in it
Measurement trust (Neptune #1, #3, fast-follow)
comparison.jsonv2:heightMismatchRatio+ magenta-padded.padded.pngdiff per viewport — short replica captures (lazy-load/admin-bar artifacts) now flag loudly instead of silently understating scoresfullPageScoreover the full common canvas — closes the top-viewport crop blind spotscoresemantics frozen for historical comparability)Refine auditability (Neptune #2, #4)
liberate_refine_reporttool enforces every finding lands inapplied[]orskipped[]— no silent dropsEditor-quality output (Neptune #5, #6)
styling-priority.mdcascade (preset → patch → instance → variation → layout → CSS) referenced by four skillsstyles/blocks/lib-*.jsontheme variations (default on, fail-open,variationHoist:falseescape,jetpack/*excluded)classNameet al.)Content survival (Neptune #7 + extensions)
safe-svgauto-installs and risky/failed SVGs route to the PNG (SVGs previously inserted broken as octet-stream)jetpack/contact-form+ field blocks; Jetpack auto-installs with the contact-form module activated; submit iscore/button type=submit(current Jetpack grammar —jetpack/buttonis connection-gated and unregistered locally)wp:htmlstays banned; WooCommerce auto-installs on the carry pathVerification
tsc --noEmitcleanNotes for reviewers
docs/superpowers/specs/plans are deliberately uncommitted (gitignored)Inserter: falsecarve-out, carry path untouched except shared compare fieldsFast-follow candidates left open: per-field proportional width emission polish, run-level agent-dispatch accounting, origin-freshness verify step.