Auto Scraping to CSV

Scrape any webpage using text-based DOM manipulation and export structured data to CSV. No external LLM required — Claude acts as the host model (brain) that handles complex page nuances, asks clarifying questions, and manages the full scraping workflow.

Philosophy: Agent-Driven Scraping

Unlike traditional scrapers that require you to write CSS selectors or XPath queries, this tool lets the agent figure it out:

You describe what you want — "Get all product names and prices"
The agent explores the page — scrolls, clicks pagination, handles lazy loading
The agent asks clarifying questions — "I see 3 price fields (retail, sale, member). Which one do you want?"
The agent handles edge cases — infinite scroll, popups, login walls, SPA navigation
You get a CSV — clean, structured, ready to use

You: "Scrape Hacker News top stories"
Agent: "I see 30 stories on the page. Do you want:
        A) Just titles and URLs
        B) Titles, URLs, points, and comment counts
        C) All of the above plus author names?"
You: "B"
Agent: [scrapes] → stories.csv (30 rows)

Installation by Platform

Claude Code

# 1. Clone or copy the skill folder to your Claude Code workspace
git clone https://github.com/Science-Prof-Robot/autoclick.git
cp -r autoclick/.claude/skills/auto-scraping-to-csv ~/.claude/skills/

# 2. Copy the bridge script to agents folder
cp ~/.claude/skills/auto-scraping-to-csv/page-agent-bridge.mjs ~/.claude/agents/

# 3. Install Playwright dependency
npm install -D playwright
npx playwright install chromium

# 4. Start the bridge (keep running in a terminal)
node ~/.claude/agents/page-agent-bridge.mjs

Use in Claude Code:

/scrape-to-csv https://news.ycombinator.com "Get top 30 stories with titles, points, and comment counts"

Cursor

# 1. Clone the repo
git clone https://github.com/Science-Prof-Robot/autoclick.git

# 2. Copy skill files to Cursor's skills directory
# (Cursor uses the same .cursor/skills/ convention)
cp -r autoclick/.claude/skills/auto-scraping-to-csv ~/.cursor/skills/
cp autoclick/.claude/agents/page-agent-bridge.mjs ~/.cursor/agents/

# 3. Install Playwright
npm install -D playwright
npx playwright install chromium

# 4. Start the bridge
node ~/.cursor/agents/page-agent-bridge.mjs

Use in Cursor:

/scrape-to-csv https://example.com/products "Extract product catalog with prices"

OpenClaw

# 1. Install via ClawHub CLI
clawhub install auto-scraping-to-csv

# 2. Copy the bridge script from installed skill to agents
cp skills/auto-scraping-to-csv/page-agent-bridge.mjs .claude/agents/

# 3. Install Playwright (if not already installed)
npm install -D playwright
npx playwright install chromium

# 4. Start the bridge
node .claude/agents/page-agent-bridge.mjs

Use in OpenClaw:

/scrape-to-csv https://www.anthropic.com/news "Get latest blog posts"

Manual / Any Editor

# 1. Clone
git clone https://github.com/Science-Prof-Robot/autoclick.git
cd autoclick

# 2. Install Playwright
npm install
npx playwright install chromium

# 3. Start bridge
node .claude/agents/page-agent-bridge.mjs

# 4. Use curl or any HTTP client to interact with the bridge
# See API Reference below

How It Works

Claude (Host Model)
    ↕  HTTP
Bridge Server (Node.js + Playwright)
    ↕  page.evaluate()
Browser (Chromium) ← Page-Agent injected

Text-based DOM: No screenshots, no vision model needed. The agent reads simplified HTML with indexed elements.
Host model: Claude is the reasoning engine. No OpenAI/Qwen API key needed.
Agent-driven: The agent handles scrolling, pagination, popups, and asks you what to do when ambiguous.
CSV export: Built-in workflow to convert scraped structured data to CSV.

Quick Start

Step 1: Start the Bridge

node .claude/agents/page-agent-bridge.mjs

Default port: 9876.

Step 2: Scrape (Agent Handles the Rest)

# Create a session — the agent will explore the page
curl -X POST http://localhost:9876/sessions \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/products", "headless": true}'

The agent will:

Fetch the DOM state
Ask you what data to extract if ambiguous
Handle scrolling/pagination if needed
Extract structured JSON
Convert to CSV

Step 3: Get Your CSV

The agent saves output.csv in your working directory.

What the Agent Handles For You

Complex Scenario	Agent Behavior
Infinite scroll	Auto-scrolls, detects "no more content", stops
Pagination	Clicks "Next", extracts from all pages, asks how many pages
Popups / modals	Detects overlays, dismisses or asks if relevant
Lazy loading	Waits for content, retries, times out gracefully
Login walls	Detects auth required, asks for credentials or stops
Multiple data formats	Asks: "I see prices as '$19.99' and 'USD 19.99'. Which format?"
Missing fields	Some items lack prices. Asks: "Skip those rows or fill with 'N/A'?"
Tables vs lists	Detects layout, asks: "Table has 5 columns. Which ones do you want?"

Natural Language Commands

When the skill is active, just ask:

/scrape-to-csv https://news.ycombinator.com
  "Get top stories with title, URL, points, and comments"

/scrape-table https://example.com/pricing
  "Extract the pricing table"

/scrape-news https://www.anthropic.com/news
  "Latest blog posts with titles, dates, and URLs"

/scrape-products https://amazon.com/s?k=laptops
  "Laptop listings: name, price, rating, Prime eligibility"

The agent will:

Navigate to the page
Explore the DOM
Ask clarifying questions if needed
Extract data
Save as CSV
Show you a preview

API Reference (For Power Users)

Start Session

curl -X POST http://localhost:9876/sessions \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "headless": true}'

Get DOM State

curl http://localhost:9876/sessions/SESSION_ID/state

Execute Action

curl -X POST http://localhost:9876/sessions/SESSION_ID/act \
  -H "Content-Type: application/json" \
  -d '{"action": "executeJavascript", "params": {"script": "return document.title;"}}'

Close Session

curl -X DELETE http://localhost:9876/sessions/SESSION_ID

Files

File	Description
`.claude/skills/auto-scraping-to-csv/SKILL.md`	Skill definition with full instructions
`.claude/skills/auto-scraping-to-csv/page-agent-bridge.mjs`	Bridge script (bundled with skill)
`.claude/agents/page-agent-bridge.mjs`	Bridge script (copy here to run)

ClawHub

Published as a ClawHub skill: auto-scraping-to-csv

clawhub install auto-scraping-to-csv

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto Scraping to CSV

Philosophy: Agent-Driven Scraping

Installation by Platform

Claude Code

Cursor

OpenClaw

Manual / Any Editor

How It Works

Quick Start

Step 1: Start the Bridge

Step 2: Scrape (Agent Handles the Rest)

Step 3: Get Your CSV

What the Agent Handles For You

Natural Language Commands

API Reference (For Power Users)

Start Session

Get DOM State

Execute Action

Close Session

Files

ClawHub

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Auto Scraping to CSV

Philosophy: Agent-Driven Scraping

Installation by Platform

Claude Code

Cursor

OpenClaw

Manual / Any Editor

How It Works

Quick Start

Step 1: Start the Bridge

Step 2: Scrape (Agent Handles the Rest)

Step 3: Get Your CSV

What the Agent Handles For You

Natural Language Commands

API Reference (For Power Users)

Start Session

Get DOM State

Execute Action

Close Session

Files

ClawHub

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages