feat: CDP-First routing + structured snapshots for web apps#16
Open
MARUCIE wants to merge 4 commits intoghostwright:mainfrom
Open
feat: CDP-First routing + structured snapshots for web apps#16MARUCIE wants to merge 4 commits intoghostwright:mainfrom
MARUCIE wants to merge 4 commits intoghostwright:mainfrom
Conversation
added 4 commits
March 25, 2026 09:06
For Chrome/Electron apps, try CDP before the expensive AX tree walk. This reduces web element find latency from ~11s to ~50ms for apps like Gmail where Chrome exposes everything as AXGroup. Changes: - CDPBridge: target cache (1s TTL), isBrowserApp() detection (16 apps), 6 new JS query strategies (CSS selector, data-testid, role+text, nearest-input, Shadow DOM pierce, fuzzy Levenshtein) - Perception: CDP-First routing for browser apps, AX-First unchanged for native apps. Zero breaking changes. - Tests: 8 new CDPBridge tests (13/13 total pass)
…hostwright#9) For Chrome/Electron apps, ghost_parse_screen now returns a compact text-based element list via CDP instead of a full screenshot image. Token cost drops from ~2000+ to ~100-200 tokens per snapshot. Output format: [e0] button "Compose" (142, 223) dom:":oq" This directly addresses Issue ghostwright#9 (screenshot context overflow) by providing a structured alternative that avoids base64 image encoding. Native apps fall through to the existing vision sidecar path unchanged.
New Session/ module: - ChromeProfileManager: persistent Chrome profiles (~/.ghost-os/profiles/), profile CRUD, Chrome launch args builder, cookie export via CDP. File permissions set to 700 for security. New Stealth/ module: - TimingJitter: log-normal human delays (Box-Muller), burst typing patterns, coordinate jitter (±2px), pre/post click delays. - BehavioralMimicry: cubic Bezier mouse paths, short-distance jitter paths, pre-action scroll simulation, off-center click offset. Tests: 10 new stealth tests (23/23 total pass). All tests verify statistical properties (distribution bounds, curvature, timing ranges).
RecipeEngine now auto-heals failed click/type/hover steps in browser apps by retrying via CDP element finding. When a DOM ID changes after a web app update, the auto-heal finds the element by text content and re-executes the action at CDP-found coordinates. Auto-heal flow: step fails → detect browser app → CDP query by computedNameContains → re-execute action at CDP coordinates → log "[auto-healed via CDP]" in step result. New recipe: github-pr-review.json (navigate Files changed, open review dialog, type comment, submit review).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enhances Ghost OS's Chrome DevTools Protocol (CDP) bridge to dramatically improve web app automation performance. Two key changes:
CDP-First routing for browser apps — For Chrome/Electron apps, try CDP before the AX tree walk. Reduces web element find latency from ~11s to ~50ms (158x speedup for Gmail).
CDP structured snapshots for
ghost_parse_screen— Returns a compact text-based element list via CDP instead of a full screenshot image. Token cost drops from ~2000+ to ~100-200 tokens. Directly addresses Screenshot result #9.What changed
CDPBridge.swift (+167 lines)
isBrowserApp()detection for 16 known Chrome/Electron appsdata-testid,role+text, nearest-input, Shadow DOM pierce, fuzzy Levenshtein match (5→11 total)actionabledetection:combobox,menuitem,cursor:pointerPerception.swift (+40 lines)
chromeWindowOriginPublic()for VisionPerception coordinate mappingVisionPerception.swift (+189 lines)
ghost_parse_screennow returns structured CDP snapshot for Chrome/Electron apps[e0] button "Compose" (142, 223) dom:":oq"token_estimatefield so agents can budget context accordinglyCDPBridgeTests.swift (+67 lines, new file)
isBrowserApp()detection (Chrome, Arc, Electron apps, native apps, nil, case insensitivity)Design decisions
isBrowserApp()— false positives safely fall through to AX; false negatives cost ~11s per queryPerformance
ghost_find "Compose"in Gmailghost_parse_screentoken cost (Chrome)Test plan
swift build— 0 errors, 1 existing deprecation warning (ScreenCapture)swift test— 13/13 pass (5 existing + 8 new)ghost_find query:"Compose" app:"Google Chrome"with Gmail openghost_parse_screen app:"Google Chrome"— verify structured output