Scrape any webpage using text-based DOM manipulation and export structured data to CSV. No external LLM required — Claude acts as the host model (brain) that handles complex page nuances, asks clarifying questions, and manages the full scraping workflow.
Unlike traditional scrapers that require you to write CSS selectors or XPath queries, this tool lets the agent figure it out:
- You describe what you want — "Get all product names and prices"
- The agent explores the page — scrolls, clicks pagination, handles lazy loading
- The agent asks clarifying questions — "I see 3 price fields (retail, sale, member). Which one do you want?"
- The agent handles edge cases — infinite scroll, popups, login walls, SPA navigation
- You get a CSV — clean, structured, ready to use
You: "Scrape Hacker News top stories"
Agent: "I see 30 stories on the page. Do you want:
A) Just titles and URLs
B) Titles, URLs, points, and comment counts
C) All of the above plus author names?"
You: "B"
Agent: [scrapes] → stories.csv (30 rows)
# 1. Clone or copy the skill folder to your Claude Code workspace
git clone https://github.com/Science-Prof-Robot/autoclick.git
cp -r autoclick/.claude/skills/auto-scraping-to-csv ~/.claude/skills/
# 2. Copy the bridge script to agents folder
cp ~/.claude/skills/auto-scraping-to-csv/page-agent-bridge.mjs ~/.claude/agents/
# 3. Install Playwright dependency
npm install -D playwright
npx playwright install chromium
# 4. Start the bridge (keep running in a terminal)
node ~/.claude/agents/page-agent-bridge.mjsUse in Claude Code:
/scrape-to-csv https://news.ycombinator.com "Get top 30 stories with titles, points, and comment counts"
# 1. Clone the repo
git clone https://github.com/Science-Prof-Robot/autoclick.git
# 2. Copy skill files to Cursor's skills directory
# (Cursor uses the same .cursor/skills/ convention)
cp -r autoclick/.claude/skills/auto-scraping-to-csv ~/.cursor/skills/
cp autoclick/.claude/agents/page-agent-bridge.mjs ~/.cursor/agents/
# 3. Install Playwright
npm install -D playwright
npx playwright install chromium
# 4. Start the bridge
node ~/.cursor/agents/page-agent-bridge.mjsUse in Cursor:
/scrape-to-csv https://example.com/products "Extract product catalog with prices"
# 1. Install via ClawHub CLI
clawhub install auto-scraping-to-csv
# 2. Copy the bridge script from installed skill to agents
cp skills/auto-scraping-to-csv/page-agent-bridge.mjs .claude/agents/
# 3. Install Playwright (if not already installed)
npm install -D playwright
npx playwright install chromium
# 4. Start the bridge
node .claude/agents/page-agent-bridge.mjsUse in OpenClaw:
/scrape-to-csv https://www.anthropic.com/news "Get latest blog posts"
# 1. Clone
git clone https://github.com/Science-Prof-Robot/autoclick.git
cd autoclick
# 2. Install Playwright
npm install
npx playwright install chromium
# 3. Start bridge
node .claude/agents/page-agent-bridge.mjs
# 4. Use curl or any HTTP client to interact with the bridge
# See API Reference belowClaude (Host Model)
↕ HTTP
Bridge Server (Node.js + Playwright)
↕ page.evaluate()
Browser (Chromium) ← Page-Agent injected
- Text-based DOM: No screenshots, no vision model needed. The agent reads simplified HTML with indexed elements.
- Host model: Claude is the reasoning engine. No OpenAI/Qwen API key needed.
- Agent-driven: The agent handles scrolling, pagination, popups, and asks you what to do when ambiguous.
- CSV export: Built-in workflow to convert scraped structured data to CSV.
node .claude/agents/page-agent-bridge.mjsDefault port: 9876.
# Create a session — the agent will explore the page
curl -X POST http://localhost:9876/sessions \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/products", "headless": true}'The agent will:
- Fetch the DOM state
- Ask you what data to extract if ambiguous
- Handle scrolling/pagination if needed
- Extract structured JSON
- Convert to CSV
The agent saves output.csv in your working directory.
| Complex Scenario | Agent Behavior |
|---|---|
| Infinite scroll | Auto-scrolls, detects "no more content", stops |
| Pagination | Clicks "Next", extracts from all pages, asks how many pages |
| Popups / modals | Detects overlays, dismisses or asks if relevant |
| Lazy loading | Waits for content, retries, times out gracefully |
| Login walls | Detects auth required, asks for credentials or stops |
| Multiple data formats | Asks: "I see prices as '$19.99' and 'USD 19.99'. Which format?" |
| Missing fields | Some items lack prices. Asks: "Skip those rows or fill with 'N/A'?" |
| Tables vs lists | Detects layout, asks: "Table has 5 columns. Which ones do you want?" |
When the skill is active, just ask:
/scrape-to-csv https://news.ycombinator.com
"Get top stories with title, URL, points, and comments"
/scrape-table https://example.com/pricing
"Extract the pricing table"
/scrape-news https://www.anthropic.com/news
"Latest blog posts with titles, dates, and URLs"
/scrape-products https://amazon.com/s?k=laptops
"Laptop listings: name, price, rating, Prime eligibility"
The agent will:
- Navigate to the page
- Explore the DOM
- Ask clarifying questions if needed
- Extract data
- Save as CSV
- Show you a preview
curl -X POST http://localhost:9876/sessions \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "headless": true}'curl http://localhost:9876/sessions/SESSION_ID/statecurl -X POST http://localhost:9876/sessions/SESSION_ID/act \
-H "Content-Type: application/json" \
-d '{"action": "executeJavascript", "params": {"script": "return document.title;"}}'curl -X DELETE http://localhost:9876/sessions/SESSION_ID| File | Description |
|---|---|
.claude/skills/auto-scraping-to-csv/SKILL.md |
Skill definition with full instructions |
.claude/skills/auto-scraping-to-csv/page-agent-bridge.mjs |
Bridge script (bundled with skill) |
.claude/agents/page-agent-bridge.mjs |
Bridge script (copy here to run) |
Published as a ClawHub skill: auto-scraping-to-csv
clawhub install auto-scraping-to-csvMIT