feat(saved-jobs): add saved/bookmarked jobs scraping with pagination and progress by IfThingsThenStuff · Pull Request #167 · stickerdaniel/linkedin-mcp-server

IfThingsThenStuff · 2026-02-26T04:02:46Z

Thanks for your work here - useful tool. Appreciate your efforts. I wanted the ability to read out my saved jobs - so, I added it. It will handle multiple pages.

Let me know if this is aligned to what you would like to include. Let me know of any changes you think are needed.

Summary

Add scrape_saved_jobs to LinkedInExtractor — scrapes the LinkedIn jobs tracker page, extracts job IDs from link hrefs, and paginates through results using numbered page buttons
Add get_saved_jobs MCP tool with progress reporting via on_progress callback
Cap total_pages with max_pages for accurate progress percentages
Use Set for O(1) job ID deduplication in the DOM polling function
Add navigation delay between page clicks consistent with other scraping methods

Test plan

test_scrape_saved_jobs_single_page — single page with progress callback
test_scrape_saved_jobs_paginates — multi-page with progress and ID collection
test_scrape_saved_jobs_timeout_stops_gracefully — timeout returns partial results
test_scrape_saved_jobs_stops_at_max_pages_despite_more_buttons — respects max_pages cap
test_scrape_saved_jobs_empty — empty results
test_get_saved_jobs — tool-level success path
test_get_saved_jobs_error — session expired error handling
Full suite: 112/112 passing

Greptile Summary

Adds get_saved_jobs MCP tool to scrape saved/bookmarked jobs from LinkedIn's job tracker with pagination and progress reporting.

Key Changes:

Pagination: Navigates through numbered page buttons, extracting job IDs from link hrefs (/jobs/view/<id>/)
Deduplication: Uses Set for O(1) job ID lookups in both JavaScript extraction and Python filtering
Progress Reporting: Implements on_progress callback with accurate page counts capped by max_pages
Error Handling: Gracefully handles timeouts, missing buttons, and empty results
Navigation: Adds 2s delay between page clicks consistent with other scraping methods

Implementation Quality:

Exposes max_pages parameter (default 10) for user control
Embeds job IDs in text sections for LLM visibility
Returns both structured job_ids list and formatted text
Comprehensive test suite: 6 new tests covering pagination, timeouts, edge cases
Full test suite passing: 112/112
Documentation updated in README.md, AGENTS.md, docs/docker-hub.md, and CLAUDE.md per development workflow

Previous Review Items Addressed:

✅ Set-based deduplication in _EXTRACT_JOB_IDS_JS (lines 389-390)
✅ Exposed max_pages parameter in tool signature (line 75)
✅ Documentation updates completed across all required files

Confidence Score: 5/5

This PR is safe to merge with no identified issues
Score reflects comprehensive test coverage (112/112 tests passing including 6 new tests), complete documentation updates per development workflow, robust error handling with graceful degradation, efficient O(1) deduplication using Sets, proper pagination logic with multiple safety breaks, and all previous review comments fully addressed
No files require special attention

Important Files Changed

Filename	Overview
linkedin_mcp_server/scraping/extractor.py	Added `scrape_saved_jobs` method with robust pagination logic, Set-based O(1) deduplication, proper error handling, and progress callbacks
linkedin_mcp_server/tools/job.py	Added `get_saved_jobs` MCP tool with exposed `max_pages` parameter, progress reporting, and consistent error handling
tests/test_scraping.py	Added comprehensive test suite with 5 tests covering single-page, pagination, timeout, max_pages cap, and empty results scenarios
tests/test_tools.py	Added tool-level tests for `get_saved_jobs` success path and error handling

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start([Start]) --> Navigate[Navigate to jobs-tracker]
    Navigate --> ExtractPage1[Extract page 1 text and IDs]
    ExtractPage1 --> CountButtons[Count pagination buttons]
    CountButtons --> CalcTotal[Calculate total_pages cap]
    CalcTotal --> ReportP1[Report progress page 1]
    ReportP1 --> CheckMore{More pages?}
    
    CheckMore -->|Yes| CheckButton{Button exists?}
    CheckButton -->|No| Append[Append ID summary]
    CheckButton -->|Yes| ClickButton[Click page button]
    ClickButton --> WaitDelay[Wait nav delay]
    WaitDelay --> WaitNewIDs{Wait for new IDs}
    
    WaitNewIDs -->|Timeout| Append
    WaitNewIDs -->|Success| Scroll[Scroll to bottom]
    Scroll --> ExtractText[Extract page text]
    ExtractText --> ExtractIDs[Extract job IDs]
    ExtractIDs --> FilterDups[Filter duplicates]
    FilterDups --> CheckNewIDs{New IDs?}
    
    CheckNewIDs -->|No| Append
    CheckNewIDs -->|Yes| AddIDs[Add to all_job_ids]
    AddIDs --> ReportProgress[Report progress]
    ReportProgress --> CheckMore
    
    CheckMore -->|No| Append
    Append --> BuildSections[Build sections dict]
    BuildSections --> Return([Return result])

_{Last reviewed commit: 5e68717}

…hub-actions-1755279694708 Add Claude Code GitHub Workflow

…gure chore: Configure Renovate

…l-sh-setup-uv-7.x chore(deps): update astral-sh/setup-uv action to v7

…hub-actions-1766618312657 Add Claude Code GitHub Workflow

…sh-setup-bun-2.x chore(deps): update oven-sh/setup-bun action to v2

…ns-checkout-6.x chore(deps): update actions/checkout action to v6

…n-3.x chore(deps): update python docker tag to v3.14

Python 3.14 is too new and key dependencies lack support: - pydantic-core: PyO3 doesn't support Python 3.14 yet - lxml: No pre-built wheels for Python 3.14 Python 3.13 is still modern and has full ecosystem support.

Add ToolAnnotations to all 6 tools with appropriate hints: - get_person_profile: readOnly, openWorld (LinkedIn API) - get_company_profile: readOnly, openWorld (LinkedIn API) - get_job_details: readOnly, openWorld (LinkedIn API) - search_jobs: readOnly, openWorld (LinkedIn API) - get_recommended_jobs: readOnly, openWorld (LinkedIn API) - close_session: not readOnly, not openWorld (local session mgmt) Tool annotations help LLM clients understand tool behavior and make better decisions about tool selection and user confirmations. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

…aniel#65) ## Summary Add `ToolAnnotations` to all 6 tools to help LLM clients understand tool behavior and make better decisions about tool selection and user confirmations. ### Changes - Added annotations to all 6 tools across 4 files: - `linkedin_mcp_server/tools/person.py` - `linkedin_mcp_server/tools/company.py` - `linkedin_mcp_server/tools/job.py` - `linkedin_mcp_server/server.py` ### Tool Annotations Added | Tool | title | readOnlyHint | destructiveHint | openWorldHint | |------|-------|--------------|-----------------|---------------| | get_person_profile | Get Person Profile | ✅ | ❌ | ✅ | | get_company_profile | Get Company Profile | ✅ | ❌ | ✅ | | get_job_details | Get Job Details | ✅ | ❌ | ✅ | | search_jobs | Search Jobs | ✅ | ❌ | ✅ | | get_recommended_jobs | Get Recommended Jobs | ✅ | ❌ | ✅ | | close_session | Close Session | ❌ | ❌ | ❌ | ### Annotation Rationale - **readOnlyHint=true**: 5 tools are read-only data retrieval from LinkedIn - **openWorldHint=true**: 5 tools access external LinkedIn API - **close_session**: Local session management (not read-only, not external) - **destructiveHint=false**: No tools delete or destroy any resources ### Why This Matters Tool annotations are part of the MCP specification that help AI clients: - Display appropriate confirmation dialogs for destructive operations - Make better decisions about autonomous tool execution - Show users accurate information about what tools do ### Testing - ✅ Python import test passes - ✅ All 6 tools verified 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Replace non-existent main.py with module execution (-m linkedin_mcp_server) in VS Code task configurations

Replace non-existent main.py with module execution (-m linkedin_mcp_server) in VS Code task configurations  --- > [!NOTE] > Align VS Code tasks with module-based entry point. > > - Replace `uv run main.py` with `uv run -m linkedin_mcp_server` across debug, standard run, and HTTP MCP server tasks > - Update task `label` and `detail` to reflect server execution; preserve flags like `--debug`, `--no-headless`, `--no-lazy-init`, and `--transport streamable-http` > - Config-only change in `.vscode/tasks.json` > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit e0460c8. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>

The CLI uses --log-level {DEBUG,INFO,WARNING,ERROR} not --debug

Upgrade fastmcp from >=2.10.1 to >=2.14.0 to fix the 307 Temporary Redirect issue when using streamable-http transport. The fix was merged in FastMCP PR #896 and #998, which changed default paths to include trailing slashes and removed automatic path manipulation that caused redirect loops with Starlette's Mount routing. This also upgrades mcp from 1.10.1 to 1.25.0 which includes related fixes confirmed by users in modelcontextprotocol/python-sdk#1168. Resolves: stickerdaniel#54

Upgrade fastmcp from >=2.10.1 to >=2.14.0 to fix the 307 Temporary Redirect issue when using streamable-http transport. The fix was merged in FastMCP PR #896 and #998, which changed default paths to include trailing slashes and removed automatic path manipulation that caused redirect loops with Starlette's Mount routing. This also upgrades mcp from 1.10.1 to 1.25.0 which includes related fixes confirmed by users in modelcontextprotocol/python-sdk#1168. Resolves: stickerdaniel#54  --- > [!NOTE] > <sup>[Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) is generating a summary for commit f2b67c2. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>

Add fakeredis and docket loggers to noise reduction to prevent DEBUG log pollution from FastMCP's internal task queue.

…ump_version_to_4.1.0 ci(release): fix workflow blocked by branch protection

…ump_version_to_4.1.1 chore: bump version to 4.1.1

…#162)

Bump version to 4.1.2 to trigger release workflow test.

) Bump version to 4.1.2 to trigger release workflow test.

…orting - Fix wait_for_function positional arg bug (arg= keyword required) - Switch pagination from broken "Next" button to numbered page buttons (button[aria-label="Page N"]) which reliably triggers content updates - Replace arbitrary asyncio.sleep() calls with DOM-based waiting via wait_for_function to detect new job links - Embed job IDs summary in section text so LLMs always surface them - Add on_progress callback for per-page progress reporting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Detect total pages from pagination buttons on the page instead of using max_pages (10), so progress reports reflect reality (1/2, 2/2 instead of 1/10, 2/10). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…kups, and add tests Address review findings: cap total_pages with max_pages to fix misleading progress percentages, add _NAV_DELAY between page clicks for rate-limit safety, convert JS prevIds.includes() to Set.has() for O(1) lookups, guard division by zero in _report, fix docstring inaccuracies, and add 5 targeted tests covering progress callbacks, timeout graceful stop, max_pages cap, and session expired error handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps

_{4 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

Address Greptile review: use Set for O(1) dedup in _EXTRACT_JOB_IDS_JS, expose max_pages parameter on get_saved_jobs MCP tool, and document the new tool in AGENTS.md, README.md, and docs/docker-hub.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

IfThingsThenStuff · 2026-03-02T00:41:54Z

Hey there @stickerdaniel - hope you're doing well. Is there anything I can do to help get this merged in sir? Thanks in advance. Let me know.

greptile-apps · 2026-04-08T18:43:56Z

Greptile Summary

This PR adds a get_saved_jobs MCP tool that scrapes LinkedIn's job tracker page (/jobs-tracker/), paginates through results using numbered page buttons in a SPA, and returns deduplicated job IDs alongside extracted text. All previously raised concerns (Set-based deduplication in _EXTRACT_JOB_IDS_JS, exposing max_pages, and adding documentation to README/CLAUDE.md/AGENTS.md) have been addressed. Tests cover single-page, multi-page, timeout, max-pages cap, and empty cases (112 passing).

Confidence Score: 5/5

Safe to merge — all previously raised concerns are resolved and no new P0/P1 issues found.

All prior review threads (Set deduplication in _EXTRACT_JOB_IDS_JS, exposing max_pages, and documentation updates for README/CLAUDE.md/AGENTS.md/docker-hub.md) are fully addressed. The only remaining finding is a P2 style suggestion to align the loop upper bound with the already-computed total_pages. The implementation is well-tested (112/112 passing), follows codebase conventions, and has no logic or security issues.

No files require special attention.

Vulnerabilities

No security concerns identified. The scraper only navigates to first-party LinkedIn URLs, does not accept user-supplied URLs or execute user-provided code, and job IDs are extracted via a regex match on numeric digits only (\d+), preventing injection.

Important Files Changed

Filename	Overview
linkedin_mcp_server/scraping/extractor.py	Adds `scrape_saved_jobs` method with SPA pagination via button clicks, Set-based deduplication, progress callbacks, and graceful timeout handling.
linkedin_mcp_server/tools/job.py	Adds `get_saved_jobs` MCP tool with `max_pages` parameter exposed and progress reporting via `_report` callback; correctly caps progress at 99% until final 100% signal.
tests/test_scraping.py	Adds 5 well-structured tests for `scrape_saved_jobs` covering single-page, multi-page pagination, timeout, max-pages cap, and empty cases.
tests/test_tools.py	Adds `test_get_saved_jobs` and `test_get_saved_jobs_error` covering the tool-level success path and session-expired error handling.

Sequence Diagram

sequenceDiagram
    participant Tool as get_saved_jobs (tool)
    participant Extractor as LinkedInExtractor
    participant Page as Patchright Page
    participant LI as LinkedIn (jobs-tracker)

    Tool->>Extractor: scrape_saved_jobs(max_pages, on_progress)
    Extractor->>Page: goto(jobs-tracker/)
    Page->>LI: HTTP GET /jobs-tracker/
    LI-->>Page: SPA HTML
    Extractor->>Page: evaluate(_EXTRACT_JOB_IDS_JS)
    Page-->>Extractor: page 1 job IDs
    Extractor->>Page: locator('button[aria-label^=Page]').count()
    Page-->>Extractor: total_pages
    Extractor->>Tool: on_progress(1, total_pages, ...)

    loop for each page 2..total_pages while button exists
        Extractor->>Page: locator('button[aria-label=Page N]').click()
        Extractor->>Page: wait_for_function(new IDs appear, timeout=15s)
        Page-->>Extractor: new IDs detected (or TimeoutError - break)
        Extractor->>Page: scroll_to_bottom()
        Extractor->>Page: evaluate(_EXTRACT_MAIN_TEXT_JS)
        Page-->>Extractor: page text
        Extractor->>Page: evaluate(_EXTRACT_JOB_IDS_JS)
        Page-->>Extractor: all visible IDs (deduped with prev_ids)
        Extractor->>Tool: on_progress(page_num, total_pages, ...)
    end

    Extractor-->>Tool: url, sections, job_ids, pages_visited, sections_requested
    Tool->>Tool: ctx.report_progress(100, 100, Complete)
    Tool-->>MCP Client: result dict

Prompt To Fix All With AI

This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 446

Comment:
**Loop upper bound ignores `total_pages`**

`total_pages` is already capped at `max_pages` via `min(…, max_pages)`, so the loop can use `total_pages + 1` as its upper bound instead of `max_pages + 1`. Both produce the same result (the button check stops the loop early either way), but using `total_pages` makes the relationship between the detected page count and the iteration range explicit and avoids iterating past the last real page.

```suggestion
        for page_num in range(2, total_pages + 1):
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (3): Last reviewed commit: "docs(saved-jobs): add docs, expose max_p..." | Re-trigger Greptile}

greptile-apps · 2026-04-08T18:43:59Z

Tip:

Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.

Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

IfThingsThenStuff · 2026-04-08T21:27:40Z

replaced by #338

stickerdaniel and others added 30 commits August 7, 2025 00:11

fix(readme): update UVX badge link for quick installation

9b064b1

"Update Claude PR Assistant workflow"

b7cb45f

"Update Claude Code Review workflow"

65109b4

Merge pull request stickerdaniel#44 from stickerdaniel/add-claude-git…

e43d7f3

…hub-actions-1755279694708 Add Claude Code GitHub Workflow

style(workflows): clean up whitespace in YAML files

c54e3fa

Add renovate.json

539db58

Merge pull request stickerdaniel#56 from stickerdaniel/renovate/confi…

9e829d5

…gure chore: Configure Renovate

chore(deps): update python docker tag to v3.14

0559870

chore(deps): update actions/checkout action to v6

6f2f6bb

chore(deps): update astral-sh/setup-uv action to v7

726e62e

chore(deps): update oven-sh/setup-bun action to v2

2a7eeab

Merge pull request stickerdaniel#60 from stickerdaniel/renovate/astra…

46827d5

…l-sh-setup-uv-7.x chore(deps): update astral-sh/setup-uv action to v7

"Update Claude PR Assistant workflow"

61d620a

"Update Claude Code Review workflow"

6e2062e

style(workflows): ensure YAML files end with newline

6771cb4

Merge pull request stickerdaniel#62 from stickerdaniel/add-claude-git…

47d2889

…hub-actions-1766618312657 Add Claude Code GitHub Workflow

Merge pull request stickerdaniel#61 from stickerdaniel/renovate/oven-…

f754bf0

…sh-setup-bun-2.x chore(deps): update oven-sh/setup-bun action to v2

Merge pull request stickerdaniel#58 from stickerdaniel/renovate/actio…

82362f8

…ns-checkout-6.x chore(deps): update actions/checkout action to v6

Merge pull request stickerdaniel#57 from stickerdaniel/renovate/pytho…

c4ed5fe

…n-3.x chore(deps): update python docker tag to v3.14

fix(ci): downgrade Python from 3.14 to 3.13

de07c95

Python 3.14 is too new and key dependencies lack support: - pydantic-core: PyO3 doesn't support Python 3.14 yet - lxml: No pre-built wheels for Python 3.14 Python 3.13 is still modern and has full ecosystem support.

fix(ci): downgrade Python from 3.14 to 3.13 (stickerdaniel#63)

dc17552

Python 3.14 is too new and key dependencies lack support: - pydantic-core: PyO3 doesn't support Python 3.14 yet - lxml: No pre-built wheels for Python 3.14 Python 3.13 is still modern and has full ecosystem support.

fix(vscode): correct entry point in VS Code tasks

e0460c8

Replace non-existent main.py with module execution (-m linkedin_mcp_server) in VS Code task configurations

fix(vscode): use --log-level DEBUG instead of --debug

b942d15

The CLI uses --log-level {DEBUG,INFO,WARNING,ERROR} not --debug

fix(vscode): use --log-level DEBUG instead of --debug (stickerdaniel#67)

2285eb0

The CLI uses --log-level {DEBUG,INFO,WARNING,ERROR} not --debug

fix(logging): filter noisy fastmcp debug logs

625cef5

Add fakeredis and docket loggers to noise reduction to prevent DEBUG log pollution from FastMCP's internal task queue.

stickerdaniel and others added 15 commits February 20, 2026 18:27

chore: bump version to 4.1.0 (stickerdaniel#158)

56c7a46

ci(release): bypass branch protection for bot push

7fc7564

ci(release): require PRs and enforce admins on main

e3a9a0d

Merge branch 'main' into 02-20-chore_bump_version_to_4.1.0

29666df

Merge pull request stickerdaniel#159 from stickerdaniel/02-20-chore_b…

0c3a616

…ump_version_to_4.1.0 ci(release): fix workflow blocked by branch protection

chore: bump version to 4.1.1

a04be8e

Merge pull request stickerdaniel#161 from stickerdaniel/02-20-chore_b…

907502a

…ump_version_to_4.1.1 chore: bump version to 4.1.1

fix(release): avoid YAML heredoc issue in restore step

e472680

fix(release): avoid YAML heredoc issue in restore step (stickerdaniel…

0efde8b

…#162)

fix(release): use PAT for branch protection API calls

3294888

Bump version to 4.1.2 to trigger release workflow test.

fix(release): use PAT for branch protection API calls (stickerdaniel#163

7f7e5c6

) Bump version to 4.1.2 to trigger release workflow test.

chore: update manifest.json and docker-compose.yml to v4.1.2 [skip ci]

3ec7a5b

fix(saved-jobs): use actual page count for progress reporting

e7217f0

Detect total pages from pagination buttons on the page instead of using max_pages (10), so progress reports reflect reality (1/2, 2/2 instead of 1/10, 2/10). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

IfThingsThenStuff marked this pull request as draft February 26, 2026 04:03

IfThingsThenStuff marked this pull request as ready for review February 26, 2026 04:03

greptile-apps Bot reviewed Feb 26, 2026

View reviewed changes

Comment thread linkedin_mcp_server/scraping/extractor.py

Comment thread linkedin_mcp_server/tools/job.py Outdated

Comment thread linkedin_mcp_server/tools/job.py

stickerdaniel force-pushed the main branch 2 times, most recently from fd80f60 to 7661f43 Compare April 3, 2026 07:59

IfThingsThenStuff force-pushed the feat/saved-jobs-fix-and-progress branch 2 times, most recently from 131a14b to 5e68717 Compare April 8, 2026 18:39

IfThingsThenStuff marked this pull request as draft April 8, 2026 18:54

IfThingsThenStuff mentioned this pull request Apr 8, 2026

Feat/saved jobs fix and progress v2 #338

Open

IfThingsThenStuff closed this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(saved-jobs): add saved/bookmarked jobs scraping with pagination and progress#167

feat(saved-jobs): add saved/bookmarked jobs scraping with pagination and progress#167
IfThingsThenStuff wants to merge 495 commits intostickerdaniel:mainfrom
IfThingsThenStuff:feat/saved-jobs-fix-and-progress

IfThingsThenStuff commented Feb 26, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IfThingsThenStuff commented Mar 2, 2026

Uh oh!

greptile-apps Bot commented Apr 8, 2026

Vulnerabilities

Uh oh!

greptile-apps Bot commented Apr 8, 2026

Uh oh!

IfThingsThenStuff commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

IfThingsThenStuff commented Feb 26, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IfThingsThenStuff commented Mar 2, 2026

Uh oh!

greptile-apps Bot commented Apr 8, 2026

Greptile Summary

Confidence Score: 5/5

Vulnerabilities

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot commented Apr 8, 2026

Uh oh!

IfThingsThenStuff commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

IfThingsThenStuff commented Feb 26, 2026 •

edited by greptile-apps Bot

Loading