6-Tier auto-escalation web scraper with AI Vision CAPTCHA solver.
Automatically escalates through increasingly powerful scraping strategies until it succeeds. Bypasses Cloudflare, DataDome, Turnstile, and other anti-bot systems.
Request
│
├─ Tier 1: httpx → Plain HTTP (fastest, ~0.5s)
├─ Tier 2: StealthyFetcher → TLS fingerprint spoofing
├─ Tier 3: patchright → Playwright with CDP leak patches
├─ Tier 4: nodriver → Direct Chrome communication (no CDP traces)
├─ Tier 5: camoufox → Firefox modified at C++ level
└─ Tier 6: Vision Solver → Screenshot → AI Vision → coordinate click
Each tier is tried in order. If a challenge page is detected, it automatically escalates to the next tier. Unavailable tiers (missing dependencies) are skipped.
| Site | Protection | Bypass Tier | Result |
|---|---|---|---|
| httpbin.org | None | Tier 1 | PASS |
| nowsecure.nl | Cloudflare | Tier 5 | PASS |
| G2.com | Cloudflare + DataDome | Tier 5 | PASS |
| Indeed.com | CF Enterprise | Tier 5 | PASS |
| Crunchbase.com | Cloudflare | Tier 5 | PASS |
| Discord.com | Cloudflare | Tier 5 | PASS |
# Minimal (Tier 1 only)
pip install tiered-scraper
# With all tiers
pip install tiered-scraper[all]
# Individual tiers
pip install tiered-scraper[stealth] # + Tier 2
pip install tiered-scraper[browser] # + Tier 3
pip install tiered-scraper[nodriver] # + Tier 4
pip install tiered-scraper[camoufox] # + Tier 5
pip install tiered-scraper[vision] # + Tier 6import asyncio
from tiered_scraper import TieredScraper
async def main():
scraper = TieredScraper()
# Auto-escalation: tries Tier 1→2→3→4→5→6 until success
html = await scraper.fetch("https://example.com")
print(f"Got {len(html)} bytes")
# Force a specific tier
html = await scraper.fetch("https://cf-protected.com", tier=5)
# Check stats
print(scraper.stats)
asyncio.run(main())scraper = TieredScraper(
timeout=30, # Per-tier timeout (seconds)
proxy="socks5://user:pass@host:port", # Proxy for all tiers
anthropic_api_key="sk-ant-...", # For Tier 6 Vision Solver
)- Speed: ~0.5s | Cost: Free
- Plain HTTP requests. No JS rendering.
- Handles: RSS feeds, simple HTML, APIs.
- Speed: ~2s | Cost: Free
- TLS fingerprint spoofing via scrapling.
- Handles: Sites checking TLS handshake patterns.
- Speed: ~3s | Cost: Free
- Patchright — Playwright with CDP leak patches.
- Handles: JS-rendered SPAs, basic bot detection.
- Speed: ~5s | Cost: Free
- Nodriver — Direct Chrome communication without CDP traces.
- Handles: Sites detecting
Runtime.EnableCDP calls. - Cloudflare bypass rate: ~83% (benchmark).
- Speed: ~8s | Cost: Free
- Camoufox — Firefox modified at C++ binary level.
- Handles: Cloudflare, DataDome, Akamai, PerimeterX.
- Detection score: 0% on major test suites.
- Speed:
15s | Cost: API call ($0.001/solve) - Screenshots the page → Claude Vision API identifies CAPTCHA location → clicks with human-like mouse movement.
- Why it works: Uses actual screen coordinates (screenX/Y in hundreds), not CDP iframe coordinates (< 100). Cloudflare Turnstile can't distinguish from human clicks.
- Handles: Turnstile, reCAPTCHA, hCaptcha, any visual challenge.
- Requires:
ANTHROPIC_API_KEYenvironment variable.
The scraper automatically detects challenge pages by looking for patterns like:
- "Checking if the site connection is secure"
- "Verify you are human"
- Cloudflare ray IDs
- Turnstile iframe markers
If a challenge is detected after fetching, the scraper escalates to the next tier instead of returning blocked content.
| Defense | T1 | T2 | T3 | T4 | T5 | T6 |
|---|---|---|---|---|---|---|
| JS rendering required | - | - | ✓ | ✓ | ✓ | ✓ |
| TLS fingerprinting | - | ✓ | - | ✓ | ✓ | ✓ |
| CDP detection | - | - | ✓ | ✓ | ✓ | ✓ |
| navigator.webdriver | - | - | ✓ | ✓ | ✓ | ✓ |
| Cloudflare challenge | - | - | - | - | ✓ | ✓ |
| DataDome | - | - | - | - | ✓ | ✓ |
| Turnstile mouse coords | - | - | - | - | ? | ✓ |
| Per-customer ML | - | - | - | - | ? | ✓ |
| Visual CAPTCHA | - | - | - | - | - | ✓ |
Built-in utilities for tracking seen URLs and persisting state:
from tiered_scraper import load_state, save_state, is_seen, mark_seen
state = load_state("./scraper-state.json")
if not is_seen(state, url):
html = await scraper.fetch(url)
mark_seen(state, url)
save_state("./scraper-state.json", state) # Atomic writeMIT