Rive Scout

Playwright-based scraping pipeline that finds Rive animation creators across Contra, Behance, and X (Twitter), then deduplicates, validates, and exports them to CSV.

How It Works

Login (persistent browser profile)
            │
    ┌───────┼───────┐
    ▼       ▼       ▼
  Contra  Behance    X
    │       │       │
    └───────┼───────┘
            ▼
   Merge & Deduplicate
            ▼
   Validate Links (HEAD/GET)
            ▼
   Export → data/out/candidates.csv

Sources:

Contra — scrolls the Rive people listing, visits each profile to extract name, email, location, and portfolio links
Behance — searches five Rive-related queries, extracts creator info from project pages and ld+json metadata
X — scrolls the @rive_app timeline, identifies tweets mentioning Rive or linking to .riv files

Pipeline stages:

Collect — source-specific scrapers gather candidate profiles with evidence of Rive expertise
Deduplicate — merges candidates across sources using identity keys (email, social usernames, website domain, name+link)
Validate — sends HTTP HEAD requests (GET fallback) to all profile links; keeps candidates with at least one 200 response
Export — writes the final list to CSV, capped at the target count

Project Layout

rive_scout/
├── src/
│   ├── main.py           # CLI entry point (--login / --run)
│   ├── config.py         # URLs, CSV columns, timeouts
│   ├── browser.py        # Playwright context with anti-detection
│   ├── enrich.py         # Deduplication and candidate merging
│   ├── validate.py       # Link validation and filtering
│   ├── export_csv.py     # CSV export
│   ├── utils_http.py     # HTTP requests with retries
│   ├── utils_text.py     # Email/URL/name parsing, Rive signal detection
│   └── sources/
│       ├── contra.py     # Contra scraper
│       ├── behance.py    # Behance scraper
│       └── x_rive.py     # X/Twitter scraper
├── data/
│   ├── raw/              # Debug snapshots (JSON, HTML, PNG)
│   ├── cache/            # Reserved for future use
│   ├── out/              # Final CSV output
│   └── profile/          # Persistent Chromium session data
├── requirements.txt
└── .env.example

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m playwright install chromium

Copy .env.example to .env and adjust if needed:

HEADLESS=true
TARGET=50

Usage

Login (one-time)

python -m src.main --login

Opens a headed browser with tabs for X, Instagram, Contra, and Behance. Log in manually, then press Enter in the terminal. Sessions persist in data/profile/.

Run the pipeline

python -m src.main --run --target 50 --headless true

Flag	Description	Default
`--run`	Execute the scraping pipeline	—
`--login`	Open browser for manual login	—
`--target N`	Target candidate count	`50`
`--headless true/false`	Run browser headlessly	`true`
`--sources contra,behance,x`	Select sources to scrape	all three

Output

Results are written to data/out/candidates.csv with these columns:

Column	Description
Full name	Candidate name
Email address	Extracted from profile page
Instagram profile	Instagram URL if found
Website	Personal/portfolio website
Platform portfolio	Contra/Behance profile URL
Best work	Notable project URL (usually Behance)
Why impressive	Left blank unless source evidence exists
Country	Parsed from location string
Source	Contra, Behance, or X
Notes	Rive signals, availability mentions
Primary profile link	Main URL used for validation
Primary link status	HTTP status code
Evidence links	All discovered URLs
Validation notes	HTTP validation details

Design Notes

No fabrication — every candidate has at least one real collected profile link
Individuals only — agencies, studios, and collectives are filtered out
Rive signal required — candidates must show explicit evidence of Rive knowledge
Social link tolerance — Instagram/X may return 403/429 due to rate limiting; candidates are kept if another link validates as 200
Anti-detection — persistent browser profile, randomized pauses, webdriver flag removal, custom User-Agent
Debug artifacts — each pipeline stage saves JSON snapshots to data/raw/ for troubleshooting

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rive Scout

How It Works

Project Layout

Setup

Usage

Login (one-time)

Run the pipeline

Output

Design Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rive Scout

How It Works

Project Layout

Setup

Usage

Login (one-time)

Run the pipeline

Output

Design Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages