Skip to content

xialGuri/Lead-Intelligence-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lead Intelligence Copilot

An event-driven, explainable Lead Intelligence system for wealth management advisors. Detects high-value prospects from financial events, scores them with a transparent weighted model, discovers warm-intro paths through a relationship graph, and generates advisor-ready outreach briefs — all accessible via a Claude-powered MCP chatbot and a Streamlit dashboard.

Core flow: Raw Event → Signal → Feature → Lead Score → Outreach


Screenshots

Claude Desktop (MCP Chatbot)

Top 5 Leads Score Explanation
Top 5 Leads Score Breakdown
Warm Intro Path Outreach Brief
Warm Intro Outreach Brief
Lead Comparison IPO Event Search
Compare Leads IPO Events

Streamlit Dashboard

Lead Scoring & Drill-down Relationship Graph & Outreach
Streamlit 1 Streamlit 2

Table of Contents


1. Why This Design

Most lead scoring systems are black-box classifiers trained on click-through data. That approach fails in wealth management because:

  1. Conversion cycles are long (months to years) — supervised labels are sparse and noisy.
  2. Advisors need explanations, not just rankings. A $50M AUM pitch requires a human-readable "why now."
  3. The signal is the event, not the behavior. An IPO or a liquidity event is worth 10,000 website visits.

So this system is built as a decision system, not a model:

  • Deterministic, explainable scoring — weighted components, no opaque ML.
  • Signal layer — normalizes heterogeneous events into a common schema.
  • Relationship graph — turns cold leads into warm intros.
  • Claude as a copilot, not a classifier — it orchestrates tools and explains reasoning in natural language.

2. Architecture

┌─────────────────────┐
│  Data Ingestion     │  SEC filings, press releases, LinkedIn,
│  (src/ingestion.py) │  internal CRM, referrals
└──────────┬──────────┘
           │ raw events
           ▼
┌─────────────────────┐
│  Signal Processing  │  normalize → Signal schema
│  (src/signals.py)   │  (event_type, recency, est_liquidity, geo, confidence)
└──────────┬──────────┘
           │ signals
           ▼
┌─────────────────────┐
│  Feature Engineering│  per-person feature vector (all 0–1 normalized)
│  (src/features.py)  │  (recency, liquidity, net_worth, relationship, …)
└──────────┬──────────┘
           │ features
           ▼
┌─────────────────────┐
│  Scoring Engine     │  explainable weighted score
│  (src/scoring.py)   │  → lead_score, priority_rank, reason[]
└──────────┬──────────┘
           │ scored leads
           ▼
┌─────────────────────┐
│  Relationship Graph │  warm-intro path discovery (Dijkstra)
│  (src/graph.py)     │  (same company / school / LinkedIn / referral)
└──────────┬──────────┘
           │ enriched leads
           ▼
┌─────────────────────┐
│  Recommendation     │  advisor assignment + outreach brief
│  (src/recommendation.py)
└──────────┬──────────┘
           │
     ┌─────┴─────┐
     ▼           ▼
┌──────────┐ ┌──────────────┐
│ Streamlit│ │ MCP Server   │
│ Dashboard│ │ + Claude     │
│ (app/)   │ │ Desktop      │
└──────────┘ └──────────────┘

Both interfaces share the exact same backend (src/pipeline.py). The numbers you see in Streamlit are identical to what Claude returns through MCP — single source of truth.


3. Tech Stack

Layer Technology
Data schemas Pydantic v2
Scoring engine Pure Python (weighted sum + reason generation)
Relationship graph NetworkX (Dijkstra shortest path)
Feature engineering NumPy, math (log-scale, exponential decay)
Dashboard Streamlit + Plotly
Chatbot MCP (FastMCP) + Claude Desktop
Data Pandas, JSON (synthetic mock dataset)
Tests pytest

4. Quick Start

Prerequisites

  • Python 3.10+
  • (Optional) Claude Desktop app for the chatbot interface

Installation

git clone https://github.com/xialGuri/Lead-Intelligence-System.git
cd Lead-Intelligence-System

pip install -r requirements.txt

Generate mock data

PYTHONPATH=. python -m data.generate_mock_data

This creates data/persons.json, data/events.json, and data/relationships.json — 11 persons, 9 events, 7 relationships.

Run the Streamlit dashboard

streamlit run app/streamlit_app.py

Opens at http://localhost:8501.

Run the console demo

PYTHONPATH=. python -m demo.demo_scenario

Prints the full pipeline walkthrough: Top 5 leads, score breakdown, warm-intro path, and outreach brief.

Run tests

PYTHONPATH=. python -m pytest tests/ -v

5. Streamlit Dashboard

The dashboard is a transparency layer — it lets stakeholders (compliance, management, data scientists) visually verify the pipeline's outputs.

Features

Section What it shows
Top-K leads table Ranked leads with score progress bars, event type, warm-intro status
Score breakdown bar chart Per-component contribution (liquidity, recency, relationship, confidence, seniority)
Feature radar chart Normalized 0–1 feature vector shape per lead
Relationship graph NetworkX spring layout with warm-intro path highlighted in red
Outreach brief panel Headline, why-now, talking points, suggested channel, draft message
Pipeline inspector Expandable tabs showing raw events → signals → features at each stage

Interactive controls (sidebar)

  • As-of date — change the reference date to see how recency decay affects scores over time
  • Top-K slider — display 3 to 10 leads

6. MCP + Claude Desktop (Chatbot)

The MCP server exposes the same pipeline as a set of tools that Claude Desktop can call. This turns Claude into an advisor copilot with zero additional UI development.

Setup

  1. Make sure mock data is generated (see Quick Start).

  2. Copy the config to Claude Desktop:

cp mcp_server/claude_desktop_config.json \
   "$HOME/Library/Application Support/Claude/claude_desktop_config.json"
  1. Restart Claude Desktop (⌘Q → reopen).

  2. Verify: click the 🔌 icon in the input bar → lead-intelligence should show as running with 7 tools.

Example conversation

You:    Show me this week's top 5 leads.
Claude: [calls top_leads_this_week] → ranked list with scores and reasons

You:    Why is #1 ranked so high?
Claude: [calls explain_lead_score] → component breakdown

You:    Find a warm intro path to p_001.
Claude: [calls find_warm_intro] → Sarah Chen → Michael Torres (strength 1.0)

You:    Draft an outreach brief for p_001.
Claude: [calls generate_outreach_brief] → full brief with talking points

You:    Compare Michael Torres and Jennifer Walsh.
Claude: [calls compare_leads] → side-by-side with recommendation

7. MCP Tools Reference

Information retrieval

Tool Input What it does
top_leads_this_week(k) k: int = 5 Returns top-k scored leads
search_events(event_type, days) event_type: str, days: int = 30 Filters raw events by type and recency
score_lead(person_id) person_id: str Returns full ScoredLead record for one person

Decision support

Tool Input What it does
explain_lead_score(person_id) person_id: str Per-component score breakdown + human-readable reasons
find_warm_intro(person_id) person_id: str Best warm-intro path from any advisor (Dijkstra, max 3 hops)
generate_outreach_brief(person_id) person_id: str Full brief: headline, why-now, talking points, draft message
compare_leads(person_ids) person_ids: list[str] Side-by-side comparison with score contributions

Resources (read-only, URI-based)

Resource URI Returns
lead://profile/{person_id} Person profile (name, company, schools, net worth)
lead://timeline/{person_id} All events for this person, most recent first
lead://graph/{person_id} Immediate neighbors in the relationship graph

8. Scoring Model (Explainable)

lead_score = 0.30 × liquidity_score       # Is there money in motion?
           + 0.25 × recency_score         # How fresh is the signal?
           + 0.20 × relationship_score    # Can we get a warm intro?
           + 0.15 × signal_confidence     # How much do we trust the source?
           + 0.10 × seniority_score       # Tiebreaker — already captured upstream

Why these weights?

Weight Component Rationale
30% Liquidity No money in motion → no need for an advisor. This is the primary action trigger.
25% Recency Advisors who engage within 2 weeks of a liquidity event win the relationship ~3× more often. Exponential decay with 30-day half-life.
20% Relationship Warm intros convert ~3× better than cold outreach. Directly encodes conversion economics.
15% Confidence SEC filings (0.98) are more trustworthy than LinkedIn (0.65). Acts as a discount factor on noisy signals.
10% Seniority Lowest weight because seniority is already correlated with liquidity and relationship — higher weight would double-count.

Key design decisions

  • No ML by design. Labels are sparse (conversions take months). Advisors and compliance need to audit every ranking. A weighted sum they can recompute on a whiteboard beats a black-box with higher AUC.
  • Weights are business-owned hyperparameters, not learned parameters. Advisor teams can tune them directly.
  • Future evolution: collect advisor feedback (accept/dismiss) → retrain weights via logistic regression per segment, while keeping the weighted-sum structure for explainability.

Feature normalization

Feature Method Why
Recency exp(-ln(2) × days / 30) Exponential decay captures the "cold window" — first 2 weeks matter most
Liquidity log10(USD) / 8 capped at 1.0 Log-scale because liquidity spans $100K–$1B (5+ orders of magnitude)
Seniority Ordinal mapping (C-suite=1.0, VP=0.65, …) Simple, auditable, no learned embeddings
Relationship Max edge strength to any advisor Promotes the single strongest connection
Confidence Source-based lookup (SEC=0.98, LinkedIn=0.65) Calibrated from historical source precision

Example output

{
  "lead_score": 0.965,
  "priority_rank": 1,
  "reasons": [
    "Company IPO 12 days ago → est. $45M liquidity event",
    "Fresh signal: referral only 1 day ago — inside the 2-week engagement window",
    "Strong warm-intro path available (connection strength 1.00)",
    "C-suite at Nimbus Robotics"
  ],
  "contributions": {
    "liquidity_score": 0.287,
    "recency_score": 0.244,
    "relationship_score": 0.200,
    "signal_confidence": 0.134,
    "seniority_score": 0.100
  }
}

9. Relationship Graph

Built with NetworkX. Edges represent real-world connections between advisors and prospects.

Edge types

Type Example Typical strength
referral Direct referral from advisor 0.9–1.0
same_company Worked together at Goldman Sachs 0.8–0.95
same_school Stanford alumni network 0.6–0.7
linkedin Connected on LinkedIn 0.3–0.5

Warm-intro discovery algorithm

  • Convert edge strength → distance via -log(strength)
  • Run Dijkstra from every advisor to the target lead
  • Select the path that maximizes the product of edge strengths
  • Limit to 3 hops (beyond that, "warm" is no longer warm)

Why warm leads are promoted

Warm intros convert ~3× better than cold outreach. Rather than using a separate heuristic, relationship_score is directly in the weighted sum (20%), so warm leads rise in the ranking automatically. This is more honest than post-hoc re-ranking.


10. Repository Layout

lead-intelligence-copilot/
├── README.md
├── requirements.txt
├── run_mcp_server.sh              # Claude Desktop launcher script
│
├── app/
│   └── streamlit_app.py           # Streamlit dashboard
│
├── data/
│   ├── generate_mock_data.py      # Synthetic dataset generator
│   ├── persons.json               # 11 persons (3 advisors + 8 prospects)
│   ├── events.json                # 9 financial events
│   └── relationships.json         # 7 relationship edges
│
├── src/
│   ├── __init__.py
│   ├── schemas.py                 # Pydantic models (Event, Signal, Feature, Lead)
│   ├── ingestion.py               # Data loaders
│   ├── signals.py                 # Event → Signal (liquidity imputation, confidence)
│   ├── features.py                # Signal → Feature (decay, log-scale, normalization)
│   ├── scoring.py                 # Explainable weighted scoring + reason generation
│   ├── graph.py                   # Relationship graph + warm-intro Dijkstra
│   ├── recommendation.py          # Outreach brief generation
│   └── pipeline.py                # End-to-end orchestration
│
├── mcp_server/
│   ├── __init__.py
│   ├── server.py                  # FastMCP server (7 tools + 3 resources)
│   └── claude_desktop_config.json # Config for Claude Desktop
│
├── demo/
│   └── demo_scenario.py           # Console walkthrough for interviews
│
└── tests/
    └── test_scoring.py            # 5 invariant tests

11. Demo Scenario (Interview)

demo/demo_scenario.py walks through the full pipeline in 6 steps:

Step What happens What it proves
0. Pipeline run Load 11 persons, 9 events, 7 relationships Data ingestion works
1. Scoring weights Print the 5-component weight table Weights are transparent and sum to 1.0
2. Top 5 leads Ranked list with scores and reasons End-to-end scoring works
3. Explain #1 Per-component breakdown for Michael Torres Explainability is real, not just a label
4. Warm intro Path: Sarah Chen → Michael Torres (strength 1.0) Graph algorithm finds the strongest path
5. Outreach brief Headline, why-now, talking points, draft message System produces actionable output, not just scores
6. Interview talking points Why no ML, why signal layer, where Claude fits Design rationale

12. Design Principles

Principle Implementation
Explainability over accuracy Weighted sum with named components. Every score decomposes into 5 auditable numbers.
Signal ≠ Event Raw events are noisy and source-specific. Signals are normalized, decayed, and confidence-weighted.
Warm > Cold Relationship score is in the weighted sum (20%), so warm leads rise automatically.
Claude is an orchestrator, not an oracle All numerics are deterministic Python. Claude selects tools and explains results.
Single source of truth Both Streamlit and MCP call the same pipeline.py. Numbers never diverge.
Business-owned weights Scoring weights are configurable hyperparameters, not learned. The business tunes them.

13. Future Improvements

  • Feedback loop: Collect advisor accept/dismiss signals → retrain segment-specific weights via logistic regression / contextual bandits.
  • More signal sources: Form 4 insider sales, probate filings, 13F institutional holdings.
  • Graph scale: Replace NetworkX with Neo4j / Neptune when the graph exceeds ~100K nodes.
  • Real-time ingestion: Replace JSON file loads with a streaming pipeline (Kafka / Pub/Sub) for live event detection.
  • Multi-tenant: Per-firm advisor coverage maps, custom weight profiles.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors