This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
scrapegraph-py is the official Python SDK for the ScrapeGraph AI API. It provides a Python client for intelligent web scraping powered by AI.
scrapegraph-py/
├── scrapegraph_py/ # Python SDK source
├── tests/ # Test suite
├── examples/ # Usage examples
├── docs/ # MkDocs documentation
├── cookbook/ # Tutorials and recipes
└── .github/workflows/ # CI/CD
- Language: Python 3.10+
- Package Manager: uv (recommended) or pip
- Core Dependencies: requests, pydantic, python-dotenv, aiohttp
- Testing: pytest, pytest-asyncio, pytest-mock, aioresponses
- Code Quality: ruff
- Build: hatchling
- Release: semantic-release
# Install
uv sync
# Test
uv run pytest tests/ -v
# Format & lint
uv run ruff format src tests
uv run ruff check src tests --fix
# Build
uv buildAlways run these commands before committing or saying a task is done:
uv run ruff format src tests
uv run ruff check src tests --fix
uv build
uv run pytest tests/ -vNo exceptions.
Core Components:
-
Clients (
scrapegraph_py/):client.py- Sync clientasync_client.py- Async client
-
Models (
scrapegraph_py/models/):- Pydantic models for request/response validation
-
Config (
scrapegraph_py/):config.py- API base URL, timeoutsexceptions.py- Custom exceptions
| Endpoint | Method | Purpose |
|---|---|---|
| SmartScraper | smartscraper() |
AI data extraction |
| SearchScraper | searchscraper() |
Multi-URL search |
| Markdownify | markdownify() |
HTML to Markdown |
| Crawler | crawler() |
Sitemap & crawling |
| AgenticScraper | agentic_scraper() |
Browser automation |
| Scrape | scrape() |
Basic HTML fetch |
| Credits | get_credits() |
Balance check |
- Add models in
scrapegraph_py/models/ - Add sync method to
client.py - Add async method to
async_client.py - Export in
models/__init__.py - Add tests in
tests/
SGAI_API_KEY- API key for authentication
from scrapegraph_py import Client
client = Client(api_key="your-key")
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract title"
)
print(response.result)