You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**LeadFlow Architecture** is a professional lead generation tool that automates the full pipeline — from data scraping to CRM integration via Webhooks and Make.com.
@@ -21,15 +21,19 @@ Built for developers and marketing teams who need to streamline lead collection
21
21
-**Reliability & Error Handling** — Break directives with automatic retry on API failures.
22
22
-**Lead Deduplication** — Built-in filters prevent duplicate records and skip leads without emails.
├── pyproject.toml # Project metadata and dependencies
279
+
├── architecture.md # Data flow architecture and component docs
280
+
└── .env.example # Environment variable template
257
281
```
258
282
259
283
---
260
284
285
+
## 🏗 Why This Architecture?
286
+
287
+
LeadFlow is intentionally built around **simplicity of deployment** over distributed complexity.
288
+
289
+
**SQLite over Redis or PostgreSQL:**
290
+
- Zero infrastructure overhead — no separate server process to manage or monitor.
291
+
- The scraping pipeline is inherently sequential per session; concurrent write pressure is minimal.
292
+
- A single `.db` file is trivially portable, backupable, and inspectable without tooling.
293
+
- Redis would add operational complexity (persistence config, eviction policy, connection pooling) with no meaningful throughput gain at this scale.
294
+
295
+
**Python-side data normalization over Make.com:**
296
+
- Make.com charges per **operation**. Pushing raw, unnormalized data and transforming it inside a scenario burns operations on every field mapping, filter, and iterator.
297
+
- Normalizing in Python before the Webhook call means Make.com receives a clean, flat payload — one HTTP module fires, one Airtable record is created. No intermediate transformations.
298
+
- Business logic stays in version-controlled code, not locked inside a visual no-code scenario that is harder to diff, test, or roll back.
299
+
300
+
---
301
+
302
+
## ⚖️ Trade-offs & Production Readiness
303
+
304
+
| Dimension | Current State | Production Consideration |
305
+
|---|---|---|
306
+
|**Concurrency**| Multi-threaded scraping per run | No distributed task queue (Celery / RQ) — single-machine only |
307
+
|**Database**| SQLite | Not suitable for multi-process writes or horizontal scaling |
308
+
|**Error Recovery**| Make.com Break directives + retry | No dead-letter queue for leads that permanently fail |
|**Rate Limiting**| Timeout config via `.env`| No adaptive back-off or proxy rotation built in |
311
+
|**Auth**| API keys in `.env`| Secrets manager (Vault, AWS SSM) recommended for team deployments |
312
+
313
+
> **Bottom line:** LeadFlow is optimized for **solo operators and small teams** running scheduled scraping jobs on a single machine. It is not designed for high-frequency, multi-tenant, or real-time production environments without the additions noted above.
Consumer -->|HTTP POST Batch| Webhook((Make.com Webhook))
22
+
Consumer -.->|UPDATE status| DB
23
+
end
24
+
25
+
%% Cloud Integration Layer
26
+
subgraph Make.com (SSOT & Enrichment)
27
+
Webhook --> Iterator[Iterator]
28
+
Iterator --> Filter{Domain valid?}
29
+
Filter -- Yes --> Airtable[(Airtable SSOT)]
30
+
Airtable --> Hunter((Hunter.io API))
31
+
Hunter --> OpenAI((OpenAI: First Line))
32
+
OpenAI --> Instantly((Instantly.ai))
33
+
end
34
+
```
35
+
36
+
---
37
+
38
+
## 🧩 Component Breakdown
39
+
40
+
### 1. ScrapeManager
41
+
The central orchestrator. Determines the appropriate Serper API endpoint (`Maps`, `Search`, or `Scrape`) based on the selected CLI execution mode and dispatches work accordingly.
42
+
43
+
### 2. Pydantic Validator & Regex
44
+
Stateless validation layer that normalizes raw API responses into clean, typed records — stripping noise, extracting domains, and formatting phone numbers before any data touches the database.
45
+
46
+
### 3. SQLite — Persistent Queue
47
+
Acts as a local buffer between the scraping and delivery layers. Ensures no leads are lost in the event of network failures, application crashes, or Make.com rate limiting. All records persist with a `status` field (`pending` / `sent` / `failed`).
48
+
49
+
### 4. Consumer Worker
50
+
Reads the `pending` queue in configurable batches and delivers payloads to the cloud webhook. Implements **Exponential Backoff** for resilient error handling on transient failures.
51
+
52
+
### 5. Make.com — SSOT & Enrichment
53
+
Serves as the business logic orchestrator in the cloud:
54
+
-**Deduplication** — filters leads already present in Airtable.
55
+
-**Email Enrichment** — queries Hunter.io by domain to retrieve verified contact emails.
56
+
-**AI Personalization** — passes company context to OpenAI to generate a tailored cold outreach opening line.
57
+
-**Campaign Injection** — pushes the enriched, personalized lead into Instantly.ai for outreach execution.
58
+
59
+
---
60
+
61
+
## 📦 Layer Summary
62
+
63
+
| Layer | Technology | Responsibility |
64
+
|---|---|---|
65
+
| Input | CLI / `.txt` file | Query or URL list ingestion |
0 commit comments