Skip to content

majd102-p/PhishGuardPro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title PhishGuard Pro
emoji 🛡️
colorFrom red
colorTo gray
sdk gradio
app_file app.py
pinned false
license mit

🛡️ PhishGuard Pro: Advanced Hybrid AI Scam & Phishing Detector

Python LangChain HuggingFace License

▶️ Live Interactive Demo on Hugging Face Spaces

📌 Project Overview

PhishGuard Pro is an enterprise-grade AI tool designed to detect, analyze, and explain phishing emails, SMS scams (Smishing), and financial fraud attempts.

It leverages a Hybrid AI Architecture by combining fast, accurate Sequence Classification (BERT) with generative explainability (RAG + LLM) to not only flag malicious content but also provide actionable cybersecurity advice in English.

This project was developed as a robust portfolio piece demonstrating advanced AI Engineering skills in Fintech & Cybersecurity.


⚠️ Infrastructure & Performance Note (For Reviewers)

Demonstration Tier Restriction: This live demo is currently hosting a massive 7-Billion Parameter LLM (Zephyr-7B-beta) on a free CPU-only tier (2 vCPUs) for portfolio demonstration purposes.

While the BERT Classifier and Regex IoC engine will execute instantly, the final AI Incident Response generation may take 5-10 minutes to process due to the lack of hardware acceleration (GPU) on the free tier. In a production cloud environment with NVIDIA A100s, inference occurs in sub-seconds.


🛠️ What I Built vs. What is Ready-Made

To maintain transparency and highlight the specific engineering effort, here is a breakdown of my custom implementation versus the open-source tools utilized:

⚙️ What I Built (My Core AI Engineering Contribution)

  1. Hybrid AI Pipeline Architecture: Architected a dual-stage inference pipeline, fusing fast BERT-based sequence classification with LLM-powered context reasoning to balance real-time latency with deep analytical accuracy.
  2. Specialized Financial RAG Engineering: Curated and embedded a high-fidelity vector knowledge base focused on sophisticated attack vectors (e.g., Pig Butchering, CEO Fraud / BEC, advanced Smishing), enabling the AI to counter complex social engineering tactics.
  3. Automated IoC Forensics Extraction: Engineered a deterministic Threat Intelligence layer utilizing Regex to parse raw inputs, instantly isolating Indicators of Compromise (malicious domains, burner emails, spoofed numbers) for immediate forensic visibility.
  4. Guardrailed Prompt Design: Implemented strict, constraint-based prompt architectures that systematically mitigate LLM hallucination and force the generation of standardized, actionable Incident Response Plans.
  5. Enterprise-Grade Analytical Dashboard: Developed a dynamic, responsive security terminal utilizing Gradio and Plotly to visually synthesize classification metrics, threat probabilities (Interactive Risk Gauge), and LLM reasoning into an intuitive analyst dashboard.

📦 Ready-Made Open-Source Models (The Giants I Stand On)

I integrated state-of-the-art free models to achieve maximum accuracy with zero deployment cost:

  • Phishing Classifier: Auguzcht/securisense-phishing-detection (Fine-tuned BERT-base).
  • Vector Embeddings: sentence-transformers/all-MiniLM-L6-v2 (Fast deployment embeddings).
  • Reasoning Engine (LLM): HuggingFaceH4/zephyr-7b-beta (Highly capable instruction-tuned 7B model).
  • Orchestration: LangChain (Vector DB bridging) and FAISS (In-memory similarity search).

🚀 How to Run Locally

  1. Clone the repository and install dependencies:
    pip install -r requirements.txt
  2. Run the Gradio app:
    python app.py

Note: Due to the usage of powerful LLMs, this application may require significant memory (RAM/VRAM) to run optimally locally. On Hugging Face Spaces, it runs efficiently within available hardware limits.


🔮 Future Roadmap (Enterprise Scaling)

While the current architecture serves as a highly effective Minimum Viable Product (MVP), transitioning this to a production-grade enterprise deployment would involve the following architectural upgrades:

  1. Model Fine-Tuning (Data-Centric Optimization)

    • What: Fine-tuning the base Sequence Classification model (e.g., utilizing larger BERT variants) specifically on datasets containing high-volume contextual Smishing (SMS phishing) and WhatsApp fraud.
    • Why: Scammers rely heavily on social engineering specifically formatted for mobile platforms. Fine-tuning guarantees extreme precision against zero-day telecom fraud.
  2. Live Threat Intelligence Integration (Dynamic Vector DB)

    • What: Migrating the static in-memory RAG store to a live, distributed Vector Database (such as Pinecone or Milvus) connected to automated OSINT (Open-Source Intelligence) threat feeds.
    • Why: Scam narratives evolve daily. A dynamic Vector DB ensures the AI's contextual knowledge base updates in real-time without requiring application rebuilds or downtime.
  3. Active URL Sandboxing & API Verification

    • What: Automatically routing the extracted Indicators of Compromise (IoCs) through professional threat aggregation APIs like VirusTotal or Google Safe Browsing.
    • Why: While the current system excels at behavioral linguistic analysis, combining AI heuristics with deterministic IP/URL reputation checks provides a fail-proof, multi-layered security blanket.
  4. Autonomous AI Agents (Tool-Calling Integration)

    • What: Upgrading the passive RAG pipeline into an active Autonomous Agent framework (via LangChain Agents) equipped with tools such as a SandboxBrowserTool and DomainLookupTool.
    • Why: Instead of merely analyzing the passive text of a message, an Agent can investigate it. If an email contains a link, the Agent autonomously securely browses the link, observes the webpage (e.g., detecting a cloned PayPal login screen), checks the domain registration date, and synthesizes a conclusive forensic report. This active investigation represents the true State-of-the-Art in AI Cybersecurity.

📜 Legal Disclaimer

This tool is for educational and advisory purposes only. Complex fraud schemes evolve rapidly. Always rely on authorized bank or official channels for final verification.

About

Hybrid AI Threat Intelligence System. Powered by BERT classifiers and a massive Zephyr-7B RAG database for deep cyber forensics and financial fraud detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages