🛡️ PhishGuard Pro: Advanced Hybrid AI Scam & Phishing Detector

title	PhishGuard Pro
emoji	🛡️
colorFrom	red
colorTo	gray
sdk	gradio
app_file	app.py
pinned	false
license	mit

🛡️ PhishGuard Pro: Advanced Hybrid AI Scam & Phishing Detector

▶️ Live Interactive Demo on Hugging Face Spaces

📌 Project Overview

PhishGuard Pro is an enterprise-grade AI tool designed to detect, analyze, and explain phishing emails, SMS scams (Smishing), and financial fraud attempts.

It leverages a Hybrid AI Architecture by combining fast, accurate Sequence Classification (BERT) with generative explainability (RAG + LLM) to not only flag malicious content but also provide actionable cybersecurity advice in English.

This project was developed as a robust portfolio piece demonstrating advanced AI Engineering skills in Fintech & Cybersecurity.

⚠️ Infrastructure & Performance Note (For Reviewers)

Demonstration Tier Restriction: This live demo is currently hosting a massive 7-Billion Parameter LLM (Zephyr-7B-beta) on a free CPU-only tier (2 vCPUs) for portfolio demonstration purposes.

While the BERT Classifier and Regex IoC engine will execute instantly, the final AI Incident Response generation may take 5-10 minutes to process due to the lack of hardware acceleration (GPU) on the free tier. In a production cloud environment with NVIDIA A100s, inference occurs in sub-seconds.

🛠️ What I Built vs. What is Ready-Made

To maintain transparency and highlight the specific engineering effort, here is a breakdown of my custom implementation versus the open-source tools utilized:

⚙️ What I Built (My Core AI Engineering Contribution)

Hybrid AI Pipeline Architecture: Architected a dual-stage inference pipeline, fusing fast BERT-based sequence classification with LLM-powered context reasoning to balance real-time latency with deep analytical accuracy.
Specialized Financial RAG Engineering: Curated and embedded a high-fidelity vector knowledge base focused on sophisticated attack vectors (e.g., Pig Butchering, CEO Fraud / BEC, advanced Smishing), enabling the AI to counter complex social engineering tactics.
Automated IoC Forensics Extraction: Engineered a deterministic Threat Intelligence layer utilizing Regex to parse raw inputs, instantly isolating Indicators of Compromise (malicious domains, burner emails, spoofed numbers) for immediate forensic visibility.
Guardrailed Prompt Design: Implemented strict, constraint-based prompt architectures that systematically mitigate LLM hallucination and force the generation of standardized, actionable Incident Response Plans.
Enterprise-Grade Analytical Dashboard: Developed a dynamic, responsive security terminal utilizing Gradio and Plotly to visually synthesize classification metrics, threat probabilities (Interactive Risk Gauge), and LLM reasoning into an intuitive analyst dashboard.

📦 Ready-Made Open-Source Models (The Giants I Stand On)

I integrated state-of-the-art free models to achieve maximum accuracy with zero deployment cost:

Phishing Classifier: Auguzcht/securisense-phishing-detection (Fine-tuned BERT-base).
Vector Embeddings: sentence-transformers/all-MiniLM-L6-v2 (Fast deployment embeddings).
Reasoning Engine (LLM): HuggingFaceH4/zephyr-7b-beta (Highly capable instruction-tuned 7B model).
Orchestration: LangChain (Vector DB bridging) and FAISS (In-memory similarity search).

🚀 How to Run Locally

Clone the repository and install dependencies:
```
pip install -r requirements.txt
```
Run the Gradio app:
```
python app.py
```

Note: Due to the usage of powerful LLMs, this application may require significant memory (RAM/VRAM) to run optimally locally. On Hugging Face Spaces, it runs efficiently within available hardware limits.

🔮 Future Roadmap (Enterprise Scaling)

While the current architecture serves as a highly effective Minimum Viable Product (MVP), transitioning this to a production-grade enterprise deployment would involve the following architectural upgrades:

Model Fine-Tuning (Data-Centric Optimization)
- What: Fine-tuning the base Sequence Classification model (e.g., utilizing larger BERT variants) specifically on datasets containing high-volume contextual Smishing (SMS phishing) and WhatsApp fraud.
- Why: Scammers rely heavily on social engineering specifically formatted for mobile platforms. Fine-tuning guarantees extreme precision against zero-day telecom fraud.
Live Threat Intelligence Integration (Dynamic Vector DB)
- What: Migrating the static in-memory RAG store to a live, distributed Vector Database (such as Pinecone or Milvus) connected to automated OSINT (Open-Source Intelligence) threat feeds.
- Why: Scam narratives evolve daily. A dynamic Vector DB ensures the AI's contextual knowledge base updates in real-time without requiring application rebuilds or downtime.
Active URL Sandboxing & API Verification
- What: Automatically routing the extracted Indicators of Compromise (IoCs) through professional threat aggregation APIs like VirusTotal or Google Safe Browsing.
- Why: While the current system excels at behavioral linguistic analysis, combining AI heuristics with deterministic IP/URL reputation checks provides a fail-proof, multi-layered security blanket.
Autonomous AI Agents (Tool-Calling Integration)
- What: Upgrading the passive RAG pipeline into an active Autonomous Agent framework (via LangChain Agents) equipped with tools such as a SandboxBrowserTool and DomainLookupTool.
- Why: Instead of merely analyzing the passive text of a message, an Agent can investigate it. If an email contains a link, the Agent autonomously securely browses the link, observes the webpage (e.g., detecting a cloned PayPal login screen), checks the domain registration date, and synthesizes a conclusive forensic report. This active investigation represents the true State-of-the-Art in AI Cybersecurity.

📜 Legal Disclaimer

This tool is for educational and advisory purposes only. Complex fraud schemes evolve rapidly. Always rely on authorized bank or official channels for final verification.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ PhishGuard Pro: Advanced Hybrid AI Scam & Phishing Detector

📌 Project Overview

⚠️ Infrastructure & Performance Note (For Reviewers)

🛠️ What I Built vs. What is Ready-Made

⚙️ What I Built (My Core AI Engineering Contribution)

📦 Ready-Made Open-Source Models (The Giants I Stand On)

🚀 How to Run Locally

🔮 Future Roadmap (Enterprise Scaling)

📜 Legal Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛡️ PhishGuard Pro: Advanced Hybrid AI Scam & Phishing Detector

📌 Project Overview

⚠️ Infrastructure & Performance Note (For Reviewers)

🛠️ What I Built vs. What is Ready-Made

⚙️ What I Built (My Core AI Engineering Contribution)

📦 Ready-Made Open-Source Models (The Giants I Stand On)

🚀 How to Run Locally

🔮 Future Roadmap (Enterprise Scaling)

📜 Legal Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages