Skip to content

feat(replicate): adopt Neptune techniques, SVG survival, forms-to-Jetpack#79

Merged
borkweb merged 6 commits into
mainfrom
feature/neptune-best-parts
Jun 11, 2026
Merged

feat(replicate): adopt Neptune techniques, SVG survival, forms-to-Jetpack#79
borkweb merged 6 commits into
mainfrom
feature/neptune-best-parts

Conversation

@borkweb

@borkweb borkweb commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

Adopts seven techniques surveyed from a8cteam51/neptune into the blocks path and shared compare layer, plus SVG media survival, native Jetpack form reconstruction, and five review-driven fast-follows. Squashed from 50 reviewed commits.

What's in it

Measurement trust (Neptune #1, #3, fast-follow)

  • comparison.json v2: heightMismatchRatio + magenta-padded .padded.png diff per viewport — short replica captures (lazy-load/admin-bar artifacts) now flag loudly instead of silently understating scores
  • fullPageScore over the full common canvas — closes the top-viewport crop blind spot
  • Parity gate: match-page skips already-matching pages before dispatching section agents (crop score semantics frozen for historical comparability)

Refine auditability (Neptune #2, #4)

  • match-section restructured to diagnose → fix → account; liberate_refine_report tool enforces every finding lands in applied[] or skipped[] — no silent drops
  • Builder envelope: prose-recovery JSON parsing, unknown-key rejection, variation dead-write + inventory-redeclare guards

Editor-quality output (Neptune #5, #6)

  • Shared styling-priority.md cascade (preset → patch → instance → variation → layout → CSS) referenced by four skills
  • Variation hoisting: recurring instance-style constellations dedupe into versioned styles/blocks/lib-*.json theme variations (default on, fail-open, variationHoist:false escape, jetpack/* excluded)
  • Block-fixer now preserves author comment attrs through invalid-block recovery and deprecation migrations (was silently dropping className et al.)

Content survival (Neptune #7 + extensions)

  • Asset triage: decorative imagery (dividers, builder chrome) classified by vision verdict, removed from output with structural-replacement hints in fallback-diagnostics
  • SVG survival: PNG raster sibling + use/defs risk scan at fetch; safe-svg auto-installs and risky/failed SVGs route to the PNG (SVGs previously inserted broken as octet-stream)
  • Forms → Jetpack: per-section form capture (schema v8) emits jetpack/contact-form + field blocks; Jetpack auto-installs with the contact-form module activated; submit is core/button type=submit (current Jetpack grammar — jetpack/button is connection-gated and unregistered locally)
  • Theme reinstall seam: pipeline coverage islands carry a metadata marker so reinstall passes validation while hand-authored wp:html stays banned; WooCommerce auto-installs on the carry path

Verification

  • 2256 tests passing (+169 vs baseline), tsc --noEmit clean
  • Three dogfood gates: swiftlumber A/B hoist-inertia (0.000000 score delta, 14/14 pixel-identical), corneliusholmes fresh-site (29 variations/355 instances, triage end-to-end, v2 flags verified genuine), forms live-render on corneliusholmes-neptune (5/5 labels, source submit text, Phone field, module auto-activation from cold)
  • Two of three gates initially FAILED and caught real defects (block-fixer attr drop, versionless variation partials, five form live-render bugs) — all root-caused and fixed with pinning tests

Notes for reviewers

  • docs/superpowers/ specs/plans are deliberately uncommitted (gitignored)
  • Known accepted behaviors: proportional form-field rows can sum >100 (Jetpack flex wraps — documented), legacy bare-island themes reinstall via the Inserter: false carve-out, carry path untouched except shared compare fields

Fast-follow candidates left open: per-field proportional width emission polish, run-level agent-dispatch accounting, origin-freshness verify step.

borkweb added 6 commits June 11, 2026 08:06
…pack

## Summary
Adopts seven techniques surveyed from a8cteam51/neptune into the blocks path and shared compare layer, then layers on SVG media survival and native Jetpack form reconstruction, plus five review-driven fast-follows. 76 files, +5473/−89, 2256 tests (baseline 2087).

## Why
Three gaps drove this: parity scores were silently untrustworthy when replica captures came up short (lazy-load/admin-bar artifacts), refine loops had no audit trail and burned agent spend on already-matching pages, and two whole content classes died in reconstruction — SVG logos (rejected or broken-inserted by default WP) and forms (dropped to dead core/html islands).

## How
- Compare v2: heightMismatchRatio + magenta-padded .padded.png diff per viewport, fullPageScore over the full common canvas (closes the top-viewport blind spot), parity-gate constants; crop score semantics frozen for historical comparability.
- Refine coverage: match-section now works diagnose→fix→account with per-section refine reports; liberate_refine_report enforces every finding lands in applied[] or skipped[]; builder envelope gained prose-recovery parsing, unknown-key rejection, variation dead-write and inventory-redeclare guards.
- Styling cascade: shared styling-priority.md reference (preset → patch → instance → variation → layout → CSS) wired into four skills.
- Variation hoisting: recurring instance-style constellations dedupe into versioned styles/blocks/lib-*.json theme variations (default on, fail-open, gated on block-fixer readiness, jetpack/* excluded); block-fixer recovery now preserves author comment attrs through invalid-block recovery and deprecation migrations.
- Asset triage: deterministic decorative-asset candidates + vision-verdict asset-triage.json consumed pre-build; removals recorded in fallback-diagnostics with structural-replacement hints.
- SVG survival: fetched SVGs gain a rasterized PNG sibling + use/defs risk scan; install auto-installs safe-svg via the new idempotent ensurePlugin helper and routes risky/failed SVGs to the PNG (previously SVGs were inserted broken as octet-stream).
- Forms→Jetpack: per-section form capture (SectionSpec.forms, schema v8) emits jetpack/contact-form + field blocks with a core/button submit (jetpack/button is connection-gated and unregistered locally); Jetpack auto-installs with the contact-form module activated when forms are detected; proportional field widths from captured geometry.
- Reinstall seam: pipeline-emitted coverage islands carry a metadata marker so theme reinstall passes validation while hand-authored wp:html stays banned; WooCommerce auto-installs on the carry path when products are detected.

## Testing
- [ ] npm test (2256 passing) and npx tsc --noEmit clean
- [ ] Dogfood gates: swiftlumber A/B hoist-inertia (0.000000 delta), corneliusholmes fresh-site (29 variations/355 instances, triage end-to-end), forms live-render on corneliusholmes-neptune (labels, submit text, Phone field, module auto-activation verified at :8890)
…n leaks

## Summary
Two generic capture fixes for the blocks reconstruct path: on a spec-cache miss, liberate_reconstruct_pages now walks the SAVED settled html/<slug>.html snapshot before falling back to a live re-navigation, and the extractFull section walk excludes chrome-descendant and aria-hidden body-section candidates.

## Why
A SECTION_SPECS_SCHEMA bump invalidates every cached spec, which previously routed all pages through extractFullFromUrl — a weaker live capture (1s settle, headless, no adapter capture seam) executed days after the screenshot phase. On the getsnooz dogfood this silently corrupted every page: the homepage hero (a Replo carousel) was caught in a different slide state and lost, and Shopify Dawn's mega-menu dropdown — hidden via visibility/opacity, so offsetParent stays non-null and isVisible() passes — won a Y-band on every page and rendered a junk product strip atop all 29 reconstructions.

## How
extractFullFromSavedHtml strips scripts (same policy as the segmentation fixture harness), route-fulfills the document at its ORIGINAL url so baseURI resolves source-relative references, lets subresources load so computed styles are real, and runs the standard extractFull walk. The handler prefers it on cache miss (new specsFromSavedHtml tally) and keeps the live path as last resort, so specs stay coherent with the screenshots and saved HTML from the same capture. The walk now drops candidates inside header/footer/nav landmarks or [aria-hidden=true] subtrees in all three collectors (semantic, band, tile); the landmark elements themselves stay eligible since stripChrome and the landmark census own them downstream.

## Testing
- [ ] npx vitest run src/lib/replicate/section-extract-chrome.test.ts (mega-menu leak, aria-hidden drawer, saved-HTML replay)
- [ ] Re-run liberate_reconstruct_pages on a captured site after deleting sections/*.json — result reports specsFromSavedHtml=N, specsFromLive=0, and no header-menu band in the reconstructed pages
## Summary
Six generic fidelity fixes on the blocks reconstruct path, found dogfooding swiftlumber: form sections now keep their photo and emit the live Jetpack form instead of islanding, multi-row source galleries render as wrapping cropped grids instead of scroller strips, span-wrapped semantic headings (Wix rich-text eyebrows) are finally captured, promoted-heading body echoes stop duplicating headlines, recoverable dropped images are appended instead of demoting sections to islands, and liberate_reconstruct_pages backs up every page's pre-update post_content.

## Why
The forms-to-Jetpack feature never fired on the page it was built for: the capture walk re-captures a form's own field labels as content cells ("First name" / body "*"), which routed the section to the cell grid, dropped the section photo, and the coverage island then discarded the jetpack/contact-form emit. Three more source-fidelity gaps surfaced on the same run: galleryBlock unconditionally emitted the horizontal scroller so a wrapping source grid (projects, 25 photos over ~7 rows) hid most of its images; the heading collector required a direct text-node child, so Wix rich-text headings (`<h1><span>GALLERY</span></h1>`) below the 28px styled floor vanished from every capture path; and a styled-`<p>` headline captured as both heading and body rendered twice. Separately, a full re-run clobbered operator-accepted post_content with no recovery path, and sections that lost only a recoverable image were demoted to non-editable islands.

## How
page-reconstruct.ts: budgeted field-label-echo cell suppression (mirrors the existing submit suppression; a cell sharing a field label but carrying real content survives); galleryBlock defers to source geometry (section height >= 2x the computed row height -> wrapping is-cropped grid, single row -> scroller, no height -> scroller for back-compat); body echoes drop only when the section's source HTML shows the text once (genuine heading+paragraph duplicates are kept); the coverage gate appends recoverable dropped images (local, above the decorative floor) as image blocks and re-measures before falling back to an island, and renderCellGrid renders unclaimed section-level images as their own grid columns. section-extract.ts: semantic h1-h6 qualify by textContent with a containment guard against double-capturing a big styled span inside an already-captured heading. reconstruct-pages handler: each page's pre-update post_content is saved to <outputDir>/.post-content-prev/<runstamp>/<slug>.html (postContentBackups/postContentBackupDir in the result) so an accepted state is always restorable.

## Testing
- [ ] npx vitest run src/lib/replicate/ src/mcp-server/ (957 tests green; new coverage in page-reconstruct.test.ts and section-extract-chrome.test.ts)
- [ ] E2E: delete sections/projects.json on a captured Wix site, re-run liberate_reconstruct_pages for that page — result reports specsFromSavedHtml=1, postContentBackups=1, and the page renders the eyebrow + headline once + a wrapping cropped grid
- [ ] On a form-bearing page, the section renders structured with a live jetpack/contact-form and its photo (no coverage island)
## Summary
measureSectionCoverage now matches captured text against the markup's decoded text content (cheerio textContent + the converted-path glyph folding) instead of a raw substring scan of the block markup. Image URL matching is unchanged.

## Why
The structured renderers emit text through escapeHtml, so the markup carries &amp;/&#39; where the captured text has &/'. The raw substring match read every such text as missing — a section whose texts ALL contained an escapable character measured 0% coverage and was demoted to a core/html island the render never warranted (corneliusholmes dogfood: "Pets & Our Mental Health", "World Men's Day", "Payment & Insurance" all islanded with their text fully rendered). It also silently understated coverage on every mixed section.

## How
Build the haystack from cheerio's decoded textContent of the rendered markup (entities decoded — including the block-fixer's &#8217; canonicalization — block comments dropped) and fold both sides with the same foldText used by measureConvertedCoverage, so the structured and converted coverage measures agree. Removed the now-unused normalize helper.

## Testing
- [ ] npx vitest run src/lib/replicate/section-coverage.test.ts (3 new cases: &-escape, apostrophe-escape, curly-vs-straight glyph fold)
- [ ] Dogfood: corneliusholmes blocks reconstruct goes from 4 text-floor islands to 0 (htmlFallbackByReason: {})
…, and gates

## Summary
Eleven review-driven fixes from the pre-landing pass on the Neptune branch: one content-loss bug (promoted-heading echo drop read entity-encoded source HTML as raw text), four hoist-safety guards, the SVG dedup raster carry, a parser work budget, a handler path guard, and three doc corrections.

## Why
The adversarial review found the echo-drop check comparing decoded DOM captures against entity-encoded sectionHtml — any heading-matching paragraph containing &/</> counted 0 occurrences and was dropped, including genuine twice-rendered duplicates (the same bug class 0d35541 fixed in the coverage gate, reintroduced one layer up). The remaining findings were latent: comment-delimiter corruption paths in the hoist rewrite, an editor-invalid desync in hoisted pattern files, a wrong-image substitution edge for deduped SVGs, an O(n²) stall in envelope recovery, and a parity-gate rule that made gate-skips unreachable for benignly height-mismatched pages.

## How
- page-reconstruct: echo-drop haystack now decodes via cheerio + foldText (shared from section-coverage); occurrence count 0 means can't-verify → keep, never drop. Two pinning tests.
- variation-hoist: attr rewrites re-serialize through serializeBlockAttrs (exported from form-blocks) so decoded unicode-escaped `--` can't terminate the block comment; derived slugs pass a lib-[a-z0-9-] charset guard before becoming theme filenames.
- reconstruct-pages: pattern files are written PRE-hoist — they never pass block-fixer canonicalization, so a comment-attr-only swap left them permanently editor-invalid on reinstall; applyHoistSwaps stays for when pattern canonicalization lands.
- media-fetch: byte-identical deduped SVGs inherit the original download's rasterPath/svgRisky, closing the basename-collision edge where install-time derivation substituted an unrelated PNG.
- builder-envelope: recoverJsonObject gains a total-work budget — the 500k length cap bounded n, not n².
- refine-report handler: slug validated as a plain path segment (no separators/traversal).
- match-page parity gate: fullPageScore gates only when heights match; magenta padding counts as diff by construction, so any sub-threshold height delta capped it below 0.995 and defeated the gate.
- Docs: tool counts corrected (README 34→35, AGENTS.md 30→35); nonexistent --no-variation-hoist CLI flag removed from AGENTS.md.

## Testing
- [ ] npx tsc --noEmit — clean
- [ ] SKIP_BROWSER_TESTS=1 npx vitest run over the changed areas — 1679 passing
- [ ] scripts/block-fixer: node --test — 11 passing
@borkweb borkweb merged commit fee0585 into main Jun 11, 2026
@borkweb borkweb deleted the feature/neptune-best-parts branch June 11, 2026 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant