A Claude Code skill for web scraping and data extraction using scrapling.
Automatically selects the best Fetcher based on target website characteristics, generates and executes Python scripts.
- Fetcher Decision Tree — Auto-select between Fetcher, StealthyFetcher, DynamicFetcher, FetcherSession, or Selector
- Cloudflare Bypass — Built-in support for Cloudflare/WAF protected sites via StealthyFetcher (Camoufox)
- Session Login — HTTP form-based login with cookie persistence
- Site Pattern Library — Reusable patterns for common site types (Discourse, SPA, static blogs, APIs)
- Cookie Vault — Local storage for login cookies with per-site templates
- Troubleshooting Guide — Solutions for common errors indexed by error message
pip install "scrapling[fetchers]"
scrapling install # Install browser dependenciesCopy the skill directory to your Claude Code skills folder:
# Copy to user-level skills (available in all projects)
cp -r . ~/.claude/skills/scrapling
# Or copy to a specific project
cp -r . /path/to/project/.claude/skills/scrapling.
├── SKILL.md # Skill definition (entry point)
├── references/
│ ├── api-quick-ref.md # Fetcher/Selector API cheat sheet
│ ├── cookie-vault.md # Cookie storage template
│ ├── maintenance.md # Installation & upgrade guide
│ ├── site-patterns.md # Site-specific scraping patterns
│ └── troubleshooting.md # Error solutions
└── templates/
├── basic_fetch.py # Static page scraping
├── stealth_cloudflare.py # Cloudflare bypass
├── session_login.py # Login + multi-page scraping
└── parse_only.py # HTML parsing without network
Once installed, Claude Code will automatically activate this skill when you ask it to:
- Scrape or crawl a website
- Extract data from a URL
- Bypass Cloudflare protection
- Parse HTML content
- Login and scrape protected pages
> Scrape the title and content from https://example.com/blog
> Extract all product prices from this page: https://shop.example.com
> This site has Cloudflare, scrape it anyway: https://protected.example.com
> I have this HTML, extract all links from it
The references/cookie-vault.md file is a template. For actual use:
- Copy it to
cookie-vault.local.md(or keep it in your local skill installation) - Fill in real cookie values from your browser's DevTools
- Never commit real cookie values to version control
基于 scrapling 的 Claude Code 网页抓取技能。
根据目标网站特征自动选择最佳 Fetcher,生成并执行 Python 脚本完成抓取任务。
- Fetcher 决策树 — 自动选择 Fetcher、StealthyFetcher、DynamicFetcher、FetcherSession 或 Selector
- Cloudflare 绕过 — 通过 StealthyFetcher (Camoufox) 内置支持 Cloudflare/WAF 防护站点
- Session 登录 — 基于 HTTP 表单的登录,自动保持 cookie 会话
- 站点模式库 — 常见站点类型的可复用抓取模式(Discourse 论坛、SPA、静态博客、API)
- Cookie 保险库 — 按站点模板存储登录 cookie
- 故障排查指南 — 按错误信息索引的常见问题解决方案
pip install "scrapling[fetchers]"
scrapling install # 安装浏览器依赖将技能目录复制到 Claude Code 的 skills 文件夹:
# 复制到用户级 skills(所有项目可用)
cp -r . ~/.claude/skills/scrapling
# 或复制到特定项目
cp -r . /path/to/project/.claude/skills/scrapling.
├── SKILL.md # 技能定义(入口文件)
├── references/
│ ├── api-quick-ref.md # Fetcher/Selector API 速查表
│ ├── cookie-vault.md # Cookie 存储模板
│ ├── maintenance.md # 安装与升级指南
│ ├── site-patterns.md # 站点专用抓取模式
│ └── troubleshooting.md # 错误解决方案
└── templates/
├── basic_fetch.py # 静态页面抓取
├── stealth_cloudflare.py # Cloudflare 绕过
├── session_login.py # 登录 + 多页抓取
└── parse_only.py # 纯 HTML 解析(无需网络)
安装后,当你向 Claude Code 提出以下需求时,技能会自动激活:
- 抓取或爬取网站
- 从 URL 提取数据
- 绕过 Cloudflare 防护
- 解析 HTML 内容
- 登录后抓取受保护页面
> 抓取 https://example.com/blog 的标题和正文
> 提取这个页面的所有商品价格:https://shop.example.com
> 这个站点有 Cloudflare,帮我绕过抓取:https://protected.example.com
> 我有这段 HTML,提取里面所有链接
references/cookie-vault.md 是一个模板文件。实际使用时:
- 复制为
cookie-vault.local.md(或在本地 skill 安装目录中保存) - 从浏览器 DevTools 填入真实 cookie 值
- 切勿将真实 cookie 值提交到版本控制
MIT