feat(browser): implement persistent browser context for session management#128
feat(browser): implement persistent browser context for session management#128irvingpop wants to merge 407 commits intostickerdaniel:mainfrom
Conversation
Updated the Dockerfile entry poin. Enhanced main.py with clearer logging and configuration handling, including adjustments for interactive and non-interactive modes. Improved documentation in README.md for command usage and options. Changes include: - Removed unnecessary setup flags in entry point. - Refined logging configuration and output. - Updated README for clarity on command-line options and usage.
Add fast, high-ROI tests covering config, utils, exceptions, error handling, authentication, and MCP tools (mocked). Includes pytest configuration, coverage reporting, and CI test job. - Add pytest.ini with async support and pytest-xdist for parallel tests - Add .coveragerc with 45% threshold and branch coverage - Add tests for config schema, loaders, and singleton pattern - Add tests for exception hierarchy and error handler - Add tests for authentication source detection - Add mocked tests for all 5 MCP tools - Add CI test job running Python 3.14 with coverage
Add fast, high-ROI tests covering config, utils, exceptions, error handling, authentication, and MCP tools (mocked). Includes pytest configuration, coverage reporting, and CI test job. - Add pytest.ini with async support and pytest-xdist for parallel tests - Add .coveragerc with 45% threshold and branch coverage - Add tests for config schema, loaders, and singleton pattern - Add tests for exception hierarchy and error handler - Add tests for authentication source detection - Add mocked tests for all 5 MCP tools - Add CI test job running Python 3.14 with coverage
…ropics-claude-code-action-digest chore(deps): update anthropics/claude-code-action digest to 231bd75
….io-astral-sh-uv-latest chore(deps): update ghcr.io/astral-sh/uv:latest docker digest to 143b40f
Updates linkedin-scraper from >=3.1.0 to >=3.1.1 which includes a fix for authentication detection that was causing --get-session to hang indefinitely. The upstream fix (joeyism/linkedin_scraper@55f2305) improves is_logged_in() to handle LinkedIn's A/B tested DOM variants by: - Adding URL-based fallback detection - Checking multiple nav selector patterns - Failing fast on auth blocker URLs Resolves: stickerdaniel#95 Related: joeyism/linkedin_scraper#269
…stickerdaniel#125) Updates linkedin-scraper from >=3.1.0 to >=3.1.1 which includes a fix for authentication detection that was causing --get-session to hang indefinitely. The upstream fix (joeyism/linkedin_scraper@55f2305) improves is_logged_in() to handle LinkedIn's A/B tested DOM variants by: - Adding URL-based fallback detection - Checking multiple nav selector patterns - Failing fast on auth blocker URLs Resolves: stickerdaniel#95 Related: joeyism/linkedin_scraper#269
The base image already provides Python, so uv doesn't install a separate Python version to /opt/python. Remove the reference to fix the Docker build failure.
…l#127) The base image already provides Python, so uv doesn't install a separate Python version to /opt/python. Remove the reference to fix the Docker build failure.
Greptile OverviewGreptile SummaryReplaces manual Key Changes:
Benefits:
Migration: Confidence Score: 4.5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant CLI as cli_main.py
participant Browser as browser.py
participant Persistent as PersistentBrowserManager
participant Playwright
User->>CLI: Start server
CLI->>CLI: Check needs_migration()
alt Legacy session exists
CLI->>Browser: migrate_from_legacy_session()
Browser->>Browser: Load legacy BrowserManager
Browser->>Browser: Extract storage state
Browser->>Persistent: Create new context
Browser->>Persistent: Transfer state
Browser->>Browser: Verify login
Browser-->>CLI: Migration successful
end
CLI->>Browser: get_or_create_browser()
Browser->>Persistent: Initialize with user_data_dir
Browser->>Persistent: start()
Persistent->>Playwright: Start playwright
Persistent->>Playwright: Launch persistent context
Note over Playwright: State persists automatically
Playwright-->>Persistent: BrowserContext with Page
Persistent-->>Browser: PersistentBrowserManager
Browser->>Browser: Navigate to LinkedIn
Browser->>Browser: Verify authentication
Browser-->>CLI: Authenticated browser
CLI->>CLI: Start FastMCP server
Note over CLI: Tools use singleton browser
User->>CLI: Shutdown
CLI->>Browser: close_browser()
Browser->>Persistent: close()
Persistent->>Playwright: Close context
Persistent->>Playwright: Stop playwright
Note over Persistent: Session persisted
|
| await persistent.start() | ||
|
|
||
| # Copy cookies from old session to new persistent context | ||
| storage_state = await temp_browser.context.storage_state() |
There was a problem hiding this comment.
Verify BrowserManager.context property exists - this relies on an undocumented interface from linkedin_scraper
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/drivers/browser.py
Line: 266:266
Comment:
Verify `BrowserManager.context` property exists - this relies on an undocumented interface from `linkedin_scraper`
How can I resolve this? If you propose a fix, please make it concise.…ement Replace manual session.json file management with Playwright's persistent browser context. Sessions now persist automatically in browser profile directory, eliminating need for manual save/load cycles. **Major Changes:** - Add PersistentBrowserManager using launch_persistent_context() - Change session storage: session.json file → browser-profile/ directory - Add automatic migration for existing session.json users - Update configuration with --user-data-dir option - Fix CLI default path (session.json → browser-profile) **Breaking Changes:** - Session location changed from ~/.linkedin-mcp/session.json to ~/.linkedin-mcp/browser-profile/ - Automatic migration provided for existing users - Version bumped to 3.0.0 **Benefits:** - More reliable cookie persistence (behaves like real browser) - No manual save/load cycles needed - Better Docker support with standard volume mount pattern - More LinkedIn-friendly (reduces CAPTCHAggers) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
Hey, can you explain the re-authenticating issues you had? The session management is implemented in the upstream scraper; maybe create an issue there suggesting the use of Playwright's persistent browser context. |
|
Hey Daniel, thanks for the quick response! Maybe I got caught in weird moment, I was getting bitten by the is_logged_in() issue and kept having to re-authenticate every time and it was getting rather tedious. But that had me thinking: the session IDs won't last forever, and the is_logged_in() detection is bound to break in the future because it is inherently fragile. So why not make it a little bit easier on myself and others by reusing the same browser session, rather than a fresh one every time? I do agree this could be more elegantly implemented in the upstream scraper, but I saw that project had a really long queue of unreviewed PRs and plus I wanted to verify this was even the right solution so I implemented here. Totally understand if you'd rather see it upstreamed, and if so I can work on that but it'll be a much more circuitous route. |
|
I see where you're coming from, but I think the upstream PR backlog is mostly stale v2 code. My recent issues there were resolved quite fast. |
|
My main constraint is avoiding the maintenance burden of custom session management within this repository |
|
Fair point, and totally understandable. If I refactored this such that persistent context stuff went into the scraper library, would you accept a PR to utilize that? |
|
Yes absolutely |
|
Upstream PR: joeyism/linkedin_scraper#270 |
fd80f60 to
7661f43
Compare
Summary
Replaces manual session.json file management with Playwright's persistent browser context for more reliable LinkedIn authentication and session persistence.
Motivation
Changes
Core Implementation
PersistentBrowserManagerclass usinglaunch_persistent_context()~/.linkedin-mcp/browser-profile/directoryMigration
session.jsonon first runsession.json.backupConfiguration
--user-data-dirCLI option for custom profile locationssession.json, nowbrowser-profile)Breaking Changes
This is a breaking change (v3.0.0):
~/.linkedin-mcp/session.json→~/.linkedin-mcp/browser-profile/--get-sessionto re-authenticate if migration failsBenefits
Testing
Verification Checklist
--get-sessioncreates profile--session-inforeports correct status--clear-sessionremoves profileMigration Guide for Users
Existing users (v2.x → v3.0):
Migration is automatic! On first run with v3.0, the server will:
session.jsonsession.json.backupBtw, I'm happy to go with whatever version numbering you want here.