feat(cdp): --cdp-url to attach to an already-running browser (logged-in session capture)#55
Conversation
|
@kzarzycki is attempting to deploy a commit to the andylizf's projects Team on Vercel. A member of the Team first needs to authorize it. |
The cdp backend always launches a throwaway `--headless=new` Chrome with an ephemeral profile, so it can't see anything behind a login, and on macOS it can't self-provision a browser at all (the bundled auto-download is linux-x64 only) — leaving CHROME_PATH as the only way to render. Add `--cdp-url URL` (env: PIXELSHOT_CDP_URL). When set, the backend connects to that DevTools endpoint (e.g. http://127.0.0.1:9222), creates a fresh tab per worker, renders using the running browser's existing session (cookies/logins), and closes only the tabs it created — never touching the user's other tabs and never killing the browser. Forces the standard path (turbo needs a process we launched) and needs no local Chrome binary. The launch and attach workers share extracted `_setup_page` and `_drain_queue` helpers (they differ only in how the page ws is obtained), so the queue/capture logic isn't duplicated. Unset → existing launch behavior is unchanged. Tests: URL normalization, attach-vs-launch routing (no browser opened), env fallback, and no-regression on the default path.
35d638f to
2b07d7b
Compare
|
Apologies for the force-push right after opening — I ran a follow-up review-and-simplify pass and extracted the shared |
Addresses review feedback on the attach path: - _fetch_json maps connection failures to a clear "Could not reach CDP endpoint at <url>" RuntimeError instead of a raw URLError/KeyError traceback. - _page_ws_url_for_target retries the /json lookup (a freshly created target can momentarily be absent) and runs the blocking HTTP fetch via asyncio.to_thread so it doesn't block the event loop. - _attached_worker moves Target.createTarget inside the try and guards closeTarget on target_id, so the browser ws is always closed and a failed create can't orphan a tab. Tests: mocked-ws test asserts only the self-created target is closed (never a pre-existing one, never Browser.close), and that a bad endpoint raises a clean RuntimeError.
|
Heads-up (unrelated to this PR): It is not introduced by this PR: a clean Flagging in case it bites CI. A small hardening of the worker teardown (e.g. tolerating an already-closed ws on |
|
Thanks for your contribution!! Inherit login page is an important feature, lets investigate into this PR. |
|
Superseded by #76 (rebased onto latest main with conflict resolution). Thanks @kzarzycki! |
Problem
The cdp backend always launches a throwaway
--headless=newChrome with an ephemeral profile (backends/cdp.py_worker). Two consequences:chrome.py's auto-download is hardcoded to linux-x64 (install_chromeraises onDarwin), so on a Mac the only way to render at all is to setCHROME_PATHat an installed browser — and even then you only get a fresh anonymous profile.What this adds
--cdp-url URL(env:PIXELSHOT_CDP_URL) — attach to an already-running browser's DevTools endpoint instead of launching one:# Brave/Chrome started with --remote-debugging-port=9222 pixelshot https://github.com --cdp-url http://127.0.0.1:9222 \ --tile-height 1568 --wait-network-idleWhen set, each worker connects to the browser-level CDP endpoint,
Target.createTargets its own fresh tab, renders through that tab's page session, andTarget.closeTargets only the tab it created on teardown. It never touches the user's other tabs and never kills the browser. It forces the standard capture path (turbo'srawFilePathneeds a process we launched) and requires no local Chrome binary — solving the macOS gap too.Accepts
http://host:port, barehost:port, orws://host:port/....Behavior & structure
render_urls._worker) and attach worker (_attached_worker) differ only in how the page websocket is obtained — launch a headless process vs. create a tab in a running browser. The shared page setup and queue/capture loop are extracted into_setup_pageand_drain_queueso neither path duplicates that logic._connect_cdp(launch path) grabstargets[0], which would hijack a user's existing tab; the attach path deliberately resolves the page ws of its own created target instead.Robustness
Target.createTargetruns inside the worker'strywith thetarget_idguarded, so a failed create can't orphan a tab and the browser websocket is always closed infinally./jsonwith a short retry (it can momentarily be absent right after creation), and that blocking HTTP fetch runs viaasyncio.to_threadso it doesn't block the event loop.--cdp-urlsurfaces a clearRuntimeError("Could not reach CDP endpoint at <url>: …")instead of a rawURLError/KeyErrortraceback.Verification
Captured
https://github.comtwo ways against a real logged-in Brave on:9222:--cdp-url→ the logged-in Dashboard (username, avatar, personal repositories, personalized feed).Same URL, same Brave binary — the only difference is whether the existing session was used. The default launch path was re-verified end to end after the helper extraction (
pixelshot https://example.comstill renders).Tests
tests/test_cdp_attach.py(no browser required):cdp_urlroutes to the attach path without opening a browser (asserts_find_chromeis never called);PIXELSHOT_CDP_URLenv fallback; and a no-regression check that the default path still resolves Chrome.targetIdand never sendsBrowser.close.RuntimeError.ruff checkis clean on the changed files. The attach-path tests pass consistently; the rest of the suite is unaffected by this change.