LeadFlow Architecture is a professional lead generation tool that automates the full pipeline β from data scraping to CRM integration via Webhooks and Make.com.
Built for developers and marketing teams who need to streamline lead collection and outreach workflows at scale.
- Automated Scraping β Extract leads from Google Maps, Google Search, or custom URL lists.
- 4 Scraping Modes β Surface search, deep scrape, hybrid, and file-based pipelines.
- Data Management β Local SQLite storage for efficient processing and queue tracking.
- Webhook Integration β Push validated leads to Make.com, Zapier, or any HTTP endpoint.
- Outreach Pipeline β Native integration with Airtable, OpenAI, Hunter.io, and Instantly.
- Reliability & Error Handling β Break directives with automatic retry on API failures.
- Lead Deduplication β Built-in filters prevent duplicate records and skip leads without emails.
- Configurable Logging β Flexible log levels:
DEBUG,INFO,ERROR. - CI Pipeline β Automated linting (Ruff), type checking (Pyright), and tests on every push.
| Layer | Technology |
|---|---|
| Language | Python 3.14+ |
| Database | SQLite |
| Validation | Pydantic v2 |
| Linting | Ruff |
| Type Checking | Pyright |
| Testing | Pytest |
| Automation | Make.com |
| Integrations | Airtable Β· OpenAI Β· Hunter.io Β· Instantly |
git clone https://github.com/PyDevDeep/LeadFlow-Architecture.git
cd LeadFlow-Architecturepip install ".[dev]"
cp .env.example .envEdit
.envwith your credentials before proceeding.
python main.py initAll commands are executed from the project root directory.
Must be run once before any scraping pipeline.
python main.py initExtracts local business data from Google Maps based on a search query.
python main.py maps -q "<your_query>"
# Example
python main.py maps -q "dental clinics in Kyiv"Extracts basic snippets and URLs from Google organic search results.
python main.py search -q "<your_query>"
# Example
python main.py search -q "top digital marketing agencies"Runs an organic search, then performs multi-threaded deep scraping on each discovered domain to extract contact details and metadata.
python main.py hybrid -q "<your_query>"
# Example
python main.py hybrid -q "software development outsourcing ukraine"Reads URLs from a .txt file and runs a multi-threaded deep scrape on each target.
python main.py file -f <filepath>
# Example
python main.py file -f urls_list.txtProcesses the pending queue and pushes validated leads to the configured Webhook endpoint.
python main.py sendpytest tests/test_suite.pyAll settings are managed via the .env file. Copy .env.example and fill in the values:
| Variable | Description |
|---|---|
DATABASE_PATH |
Path to the SQLite database file |
LOG_LEVEL |
Logging verbosity (DEBUG / INFO / ERROR) |
SERPER_API_KEY |
API key for Serper.dev (search & maps) |
MAKE_LEAD_KEY |
Secret key for Make.com Webhook authentication |
WEBHOOK_URL |
Destination endpoint for lead delivery |
WEBHOOK_BATCH_SIZE |
Number of leads sent per batch |
SCRAPER_TIMEOUT |
HTTP request timeout in seconds |
SCRAPER_RETRIES |
Number of retry attempts on request failure |
SCRAPER_MAX_WORKERS |
Thread pool size for parallel scraping |
SERPER_MAX_RESULTS |
Max results returned per Serper API call |
The project uses GitHub Actions for continuous integration. The pipeline runs on every push and pull request to main / master.
push / PR β Ruff (lint) β Pyright (type check) β Pytest
| Step | Tool | Purpose |
|---|---|---|
| Lint | Ruff | Code style and import checks |
| Type Check | Pyright (strict) | Static type safety across app/ |
| Tests | Pytest | Functional test suite with isolated env credentials |
CI config: .github/workflows/ci.yml
A ready-to-use blueprint is available in the /automation directory.
- Download
Make.jsonoroutreach_pipeline.jsonfrom/automation - In Make.com β create a new scenario β Import Blueprint
- Connect each module marked with a red
!:- Airtable (Production)
- Hunter.io (Domain Search)
- OpenAI (GPT-4o-mini)
- Instantly (Lead Import)
- Replace all
YOUR_...placeholders with your actual IDs (see tables below) - Copy the generated Webhook URL from Module 1 β paste it into your
.envasWEBHOOK_URL
| Variable | Description | Where to Find |
|---|---|---|
YOUR_BASE_ID |
Unique ID of your Airtable Base | Airtable URL / API docs |
YOUR_TABLE_ID |
Name or ID of the target table | Airtable table URL |
YOUR_INSTANTLY_CAMPAIGN_ID |
ID of your outreach campaign | Instantly.ai dashboard |
For Module 11 (Create a Record) to function correctly, your Airtable table must include columns matching these identifiers:
| Placeholder | Field Description |
|---|---|
YOUR_FIELD_ID_DOMAIN |
Company domain (e.g. example.com) |
YOUR_FIELD_ID_EMAIL |
Primary contact email found by Hunter |
YOUR_FIELD_ID_COMPANY_NAME |
Organization name |
YOUR_FIELD_ID_URL |
Full website URL |
YOUR_FIELD_ID_DB_ID |
Original ID from the lead source / webhook |
YOUR_FIELD_ID_PHONE |
Contact phone number |
YOUR_FIELD_ID_DESCRIPTION |
Company description used for AI context |
YOUR_FIELD_ID_SOURCE_METHOD |
Label indicating lead origin |
YOUR_FIELD_ID_AI_RESPONSE |
GPT-generated personalized opening line |
YOUR_FIELD_ID_STATUS |
Lead status (Ready / Done) |
- Error Handling β Break directives on OpenAI, Hunter, and Instantly modules; Make.com auto-retries on service failures.
- Lead Filtering β Skips leads already present in Airtable and leads without a valid email, reducing unnecessary API token usage.
| Check | Status |
|---|---|
| API keys / passwords in JSON | β None found |
__IMTCONN__ connection fields |
β
All set to null |
| Sensitive placeholders | β Correctly masked |
For a full component breakdown and data flow diagram, see architecture.md.
.
βββ .github/
β βββ workflows/
β βββ ci.yml # GitHub Actions CI pipeline
βββ app/
β βββ scraper/ # Scraping modules (client, parser, logic)
β βββ sender/ # Webhook sending logic
β βββ utils/ # Utilities and logging
β βββ config.py # Environment configuration
β βββ database.py # Database interaction layer
βββ automation/ # Make.com blueprints and assets
β βββ Make.json
β βββ outreach_pipeline.json
β βββ Scenario_IMG.jpg
βββ tests/
β βββ test_suite.py # Pytest test suite
βββ main.py # CLI entry point
βββ pyproject.toml # Project metadata and dependencies
βββ architecture.md # Data flow architecture and component docs
βββ .env.example # Environment variable template
LeadFlow is intentionally built around simplicity of deployment over distributed complexity.
SQLite over Redis or PostgreSQL:
- Zero infrastructure overhead β no separate server process to manage or monitor.
- The scraping pipeline is inherently sequential per session; concurrent write pressure is minimal.
- A single
.dbfile is trivially portable, backupable, and inspectable without tooling. - Redis would add operational complexity (persistence config, eviction policy, connection pooling) with no meaningful throughput gain at this scale.
Python-side data normalization over Make.com:
- Make.com charges per operation. Pushing raw, unnormalized data and transforming it inside a scenario burns operations on every field mapping, filter, and iterator.
- Normalizing in Python before the Webhook call means Make.com receives a clean, flat payload β one HTTP module fires, one Airtable record is created. No intermediate transformations.
- Business logic stays in version-controlled code, not locked inside a visual no-code scenario that is harder to diff, test, or roll back.
| Dimension | Current State | Production Consideration |
|---|---|---|
| Concurrency | Multi-threaded scraping per run | No distributed task queue (Celery / RQ) β single-machine only |
| Database | SQLite | Not suitable for multi-process writes or horizontal scaling |
| Error Recovery | Make.com Break directives + retry | No dead-letter queue for leads that permanently fail |
| Observability | File-based logging | No structured log aggregation (Datadog, Loki, etc.) |
| Rate Limiting | Timeout config via .env |
No adaptive back-off or proxy rotation built in |
| Auth | API keys in .env |
Secrets manager (Vault, AWS SSM) recommended for team deployments |
Bottom line: LeadFlow is optimized for solo operators and small teams running scheduled scraping jobs on a single machine. It is not designed for high-frequency, multi-tenant, or real-time production environments without the additions noted above.
Contributions are welcome!
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Commit your changes:
git commit -m 'Add some feature' - Push to the branch:
git push origin feature/your-feature - Open a Pull Request
This project is licensed under the MIT License.
