Skip to content

feat(cdp): --cdp-url to attach to an already-running browser (logged-in session capture)#55

Closed
kzarzycki wants to merge 2 commits into
StarTrail-org:mainfrom
kzarzycki:feat/cdp-attach-existing-browser
Closed

feat(cdp): --cdp-url to attach to an already-running browser (logged-in session capture)#55
kzarzycki wants to merge 2 commits into
StarTrail-org:mainfrom
kzarzycki:feat/cdp-attach-existing-browser

Conversation

@kzarzycki

@kzarzycki kzarzycki commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Problem

The cdp backend always launches a throwaway --headless=new Chrome with an ephemeral profile (backends/cdp.py _worker). Two consequences:

  1. No session. It can't render anything behind a login — the new tab has no cookies, so authenticated pages come back as their logged-out variant.
  2. macOS can't self-provision. chrome.py's auto-download is hardcoded to linux-x64 (install_chrome raises on Darwin), so on a Mac the only way to render at all is to set CHROME_PATH at an installed browser — and even then you only get a fresh anonymous profile.

What this adds

--cdp-url URL (env: PIXELSHOT_CDP_URL) — attach to an already-running browser's DevTools endpoint instead of launching one:

# Brave/Chrome started with --remote-debugging-port=9222
pixelshot https://github.com --cdp-url http://127.0.0.1:9222 \
  --tile-height 1568 --wait-network-idle

When set, each worker connects to the browser-level CDP endpoint, Target.createTargets its own fresh tab, renders through that tab's page session, and Target.closeTargets only the tab it created on teardown. It never touches the user's other tabs and never kills the browser. It forces the standard capture path (turbo's rawFilePath needs a process we launched) and requires no local Chrome binary — solving the macOS gap too.

Accepts http://host:port, bare host:port, or ws://host:port/....

Behavior & structure

  • Unset (and no env var) → the launch path behaves exactly as before; the attach path is a separate branch in render_urls.
  • The launch worker (_worker) and attach worker (_attached_worker) differ only in how the page websocket is obtained — launch a headless process vs. create a tab in a running browser. The shared page setup and queue/capture loop are extracted into _setup_page and _drain_queue so neither path duplicates that logic.
  • One fresh tab per worker against the single shared browser — no extra processes, true parallelism, no interference with existing tabs.
  • _connect_cdp (launch path) grabs targets[0], which would hijack a user's existing tab; the attach path deliberately resolves the page ws of its own created target instead.

Robustness

  • Target.createTarget runs inside the worker's try with the target_id guarded, so a failed create can't orphan a tab and the browser websocket is always closed in finally.
  • The created target is resolved from /json with a short retry (it can momentarily be absent right after creation), and that blocking HTTP fetch runs via asyncio.to_thread so it doesn't block the event loop.
  • A bad or unreachable --cdp-url surfaces a clear RuntimeError("Could not reach CDP endpoint at <url>: …") instead of a raw URLError/KeyError traceback.

Verification

Captured https://github.com two ways against a real logged-in Brave on :9222:

  • --cdp-url → the logged-in Dashboard (username, avatar, personal repositories, personalized feed).
  • default (spawned headless) → the logged-out marketing landing ("Sign in" / "Sign up").

Same URL, same Brave binary — the only difference is whether the existing session was used. The default launch path was re-verified end to end after the helper extraction (pixelshot https://example.com still renders).

Tests

tests/test_cdp_attach.py (no browser required):

  • URL normalization across the accepted forms.
  • cdp_url routes to the attach path without opening a browser (asserts _find_chrome is never called); PIXELSHOT_CDP_URL env fallback; and a no-regression check that the default path still resolves Chrome.
  • Target lifecycle (mocked ws): with a pre-existing target present, teardown closes only the self-created targetId and never sends Browser.close.
  • A bad endpoint raises the clean RuntimeError.

ruff check is clean on the changed files. The attach-path tests pass consistently; the rest of the suite is unaffected by this change.

@vercel

vercel Bot commented Jun 21, 2026

Copy link
Copy Markdown

@kzarzycki is attempting to deploy a commit to the andylizf's projects Team on Vercel.

A member of the Team first needs to authorize it.

The cdp backend always launches a throwaway `--headless=new` Chrome with an
ephemeral profile, so it can't see anything behind a login, and on macOS it
can't self-provision a browser at all (the bundled auto-download is linux-x64
only) — leaving CHROME_PATH as the only way to render.

Add `--cdp-url URL` (env: PIXELSHOT_CDP_URL). When set, the backend connects to
that DevTools endpoint (e.g. http://127.0.0.1:9222), creates a fresh tab per
worker, renders using the running browser's existing session (cookies/logins),
and closes only the tabs it created — never touching the user's other tabs and
never killing the browser. Forces the standard path (turbo needs a process we
launched) and needs no local Chrome binary.

The launch and attach workers share extracted `_setup_page` and `_drain_queue`
helpers (they differ only in how the page ws is obtained), so the queue/capture
logic isn't duplicated. Unset → existing launch behavior is unchanged.

Tests: URL normalization, attach-vs-launch routing (no browser opened), env
fallback, and no-regression on the default path.
@kzarzycki kzarzycki force-pushed the feat/cdp-attach-existing-browser branch from 35d638f to 2b07d7b Compare June 21, 2026 07:43
@kzarzycki

Copy link
Copy Markdown
Contributor Author

Apologies for the force-push right after opening — I ran a follow-up review-and-simplify pass and extracted the shared _setup_page/_drain_queue helpers so the launch and attach workers don't duplicate the capture loop. The branch and PR description now reflect that; no further history rewrites planned.

@kzarzycki kzarzycki marked this pull request as ready for review June 21, 2026 07:48
@kzarzycki kzarzycki marked this pull request as draft June 21, 2026 07:50
Addresses review feedback on the attach path:
- _fetch_json maps connection failures to a clear "Could not reach CDP
  endpoint at <url>" RuntimeError instead of a raw URLError/KeyError traceback.
- _page_ws_url_for_target retries the /json lookup (a freshly created target
  can momentarily be absent) and runs the blocking HTTP fetch via
  asyncio.to_thread so it doesn't block the event loop.
- _attached_worker moves Target.createTarget inside the try and guards
  closeTarget on target_id, so the browser ws is always closed and a failed
  create can't orphan a tab.

Tests: mocked-ws test asserts only the self-created target is closed (never a
pre-existing one, never Browser.close), and that a bad endpoint raises a clean
RuntimeError.
@kzarzycki kzarzycki marked this pull request as ready for review June 21, 2026 08:23
@kzarzycki kzarzycki marked this pull request as draft June 21, 2026 08:25
@kzarzycki kzarzycki marked this pull request as ready for review June 21, 2026 08:26
@kzarzycki

Copy link
Copy Markdown
Contributor Author

Heads-up (unrelated to this PR): tests/test_render.py::test_render_local_html_to_tiles looks flaky under full-suite load. While verifying this branch I saw it fail intermittently with [w0] FAIL … no close frame received or sent — a websocket-teardown race on the launch path, not a render failure.

It is not introduced by this PR: a clean main checkout fails the same test the same way in a full pytest tests/ run (1 failed, 10 passed), while it passes reliably in isolation (pytest tests/test_render.py → green in ~3s). So it surfaces only when the full browser-spawning suite runs back-to-back.

Flagging in case it bites CI. A small hardening of the worker teardown (e.g. tolerating an already-closed ws on ws.close()) would likely settle it — happy to open a separate PR if useful, but I've kept it out of this one to stay scoped.

@yichuan-w

Copy link
Copy Markdown
Contributor

Thanks for your contribution!! Inherit login page is an important feature, lets investigate into this PR.

@yichuan-w yichuan-w requested a review from andylizf June 22, 2026 01:06
andylizf added a commit that referenced this pull request Jun 23, 2026
…des #55) (#76)

* feat(cdp): --cdp-url to attach to an already-running browser (#55)

Co-Authored-By: kzarzycki <kzarzycki@users.noreply.github.com>

* style: ruff format

* chore: remove throwaway repro script from #66

---------

Co-authored-by: kzarzycki <kzarzycki@users.noreply.github.com>
@andylizf

Copy link
Copy Markdown
Contributor

Superseded by #76 (rebased onto latest main with conflict resolution). Thanks @kzarzycki!

@andylizf andylizf closed this Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants