Skip to content

thechiranjeevvyas/Kavach

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ›ก๏ธ KAVACH: Privacy-First AI Interaction

๐Ÿ” "Your Personal Redaction Shield Before Talking to AI"


๐Ÿ“– Project Overview

KAVACH is a powerful, privacy-centric text redaction and sanitation system built to shield sensitive user data from exposure to Large Language Models (LLMs) ๐Ÿค–.

It acts as a smart intermediary โ€” identifying and redacting Personally Identifiable Information (PII) from your text so you can interact with AI without compromising your privacy ๐Ÿ”.

๐ŸŽฏ Engineered using a multi-layered AI stack, KAVACH achieves over 89% accuracy in PII detection, making it a reliable privacy protector in digital communication.


โœจ Features

  • ๐Ÿ” Intelligent PII Redaction
    Detects and replaces sensitive entities like names, phone numbers, addresses, emails, and more.

  • ๐Ÿ‡ฎ๐Ÿ‡ณ Support for Indian PII
    Recognizes Aadhaar, PAN, and other India-specific identifiers.

  • ๐Ÿง  Contextual Data Protection
    Uses NER models to identify sensitive data even without clear patterns (like uncommon names or cities).

  • ๐Ÿท๏ธ Text Sensitivity Classification
    Labels input as confidential, personal, or public using LLM-based text classification.

  • ๐Ÿ”— Seamless LLM Integration
    Ensures only redacted & safe text is passed to AI models.


๐Ÿงฌ How It Works: Layered Protection System

KAVACH uses a 3-layer defense strategy ๐Ÿงฑ for robust privacy:

1๏ธโƒฃ Pattern-Based Scanning

๐Ÿ› ๏ธ Utilizes Presidio to scan and tag PII with recognizable formats:

  • Phone โ†’ [PHONE_NUMBER]
  • Email โ†’ [EMAIL_ADDRESS]
  • Aadhaar, PAN โ†’ [ID_NUMBER]

2๏ธโƒฃ Contextual Detection (NER)

๐Ÿค– Employs ai4bharat/IndicNER from HuggingFace for Named Entity Recognition:

  • Detects names, locations, organizations, etc. without pattern reliance.

3๏ธโƒฃ Sensitivity Classification

๐Ÿง  Leverages facebook/bart-large-mnli to assign sensitivity tags:

  • Confidential ๐Ÿ›ก๏ธ
  • Personal ๐Ÿ™‹โ€โ™‚๏ธ
  • Public ๐ŸŒ

Only after these steps is the sanitized text released for AI use.


๐Ÿ’ป Local Setup & Run

Want to try it out? Hereโ€™s how you can run it locally ๐Ÿงช:

๐Ÿ”ง Prerequisites

  • Python 3.8+
  • pip (Python package installer)
  • Git

๐Ÿ› ๏ธ Setup Steps

๐Ÿ“… Step 1: Clone the Repository

git clone https://github.com/thechiranjeevvyas/Kavach.git
cd RAKSHak

๐Ÿงฑ Step 2: Create and Activate a Virtual Environment

Create a virtual environment:

python -m venv venv

Activate it:

  • macOS / Linux:
source venv/bin/activate
  • Windows (Command Prompt):
.\venv\Scripts\activate
  • Windows (PowerShell):
.\venv\Scripts\Activate.ps1

โœ… Youโ€™ll see (venv) in your terminal prompt when activated.

๐Ÿ“ฆ Step 3: Install Dependencies

Install required Python packages:

pip install -r requirements.txt

Sample requirements.txt content:

streamlit
presidio-analyzer
python-dotenv
transformers
torch
groq
sentencepiece
accelerate

๐Ÿ”‘ Step 4: Set Up Groq API Key

To connect with the LLM securely:

Create a .streamlit folder:

mkdir .streamlit

Inside it, create secrets.toml:

GROQ_API_KEY = "your_groq_api_key_here"

๐Ÿ” Replace with your real API key from Groq Console.

โ–ถ๏ธ Step 5: Run the Streamlit App

streamlit run main2.py

๐ŸŒ Opens in your browser at: http://localhost:8501


๐Ÿงช Testing Your Application

๐Ÿ”น Comprehensive Input

Urgent internal memo: This document contains highly confidential information. Patient Anjali Sharma (DOB: 15/03/1988) visited Apollo Hospital on 2024-06-20 for follow-up. Her unique patient ID is PX7890123. The physician Dr. Rajesh Kumar (Mobile: +919876543210) noted her AADHAAR number 9876 5432 1098. She works for TechSolutions India. Employee ID E12ABU5678 is assigned to Mr. Vikram Singh, a senior analyst at State Bank of India. His PAN is ABCDE1234F and email is vikram.singh@examplebank.com. We also received a query from a Ministry of Defense official regarding vehicle registration DL01CD1234 for a new project located near the Air Force Station, Hindon. Please ensure all PII and sensitive project details are redacted before sharing any summaries. Contact our legal department at legal@techsolutions.com for further clarification. Voting ID of Ms. Priya Patel: ABC1234567. Passport No. K1234567.

๐Ÿ”น Short Input: Personal Info

Could you please help me confirm my identity? My full name is Sarah Miller, and my private phone number is +1-202-555-0100. Thanks.

๐Ÿ”น Short Input: Government Sensitive Info

Please redact details from this secure document. The official government ID for the operation is GVT-SEC-98765, linked to agent E78XYZ4321.

๐Ÿค Contributing

  • Fork the repository
  • Create a new branch
  • Commit changes
  • Submit a pull request

For significant changes, open an issue to start a discussion.


About

KAVACH is a powerful, privacy-centric text redaction and sanitation system built to shield sensitive user data from exposure to Large Language Models (LLMs) ๐Ÿค–.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors