Self-hosted LLM firewall with on-device GPU threat detection.
Analyze every message for prompt injection, jailbreaks, PII leakage, and semantic evasion attacks using a real 21B-parameter model — not regex, not keyword matching.
Quick Start • Connect Clients • Configuration • Deployment • Security
Most LLM security tools rely on pattern matching or cloud-hosted classifiers. PooGuard runs a 21-billion parameter MoE safeguard model (3.6B active) directly on your GPU — every request scored locally, nothing leaves your infrastructure. Deploy it as a drop-in OpenAI-compatible proxy: point SillyTavern, Open WebUI, Chatbox, or any client at PooGuard and every request is analyzed, scored, and logged with zero changes to your setup.
- ML-Powered Classification — Per-category confidence scores for prompt injection, jailbreak, and PII threats. Calibrated against a 360-example benchmark dataset with F1 scores of 0.79 / 0.65 / 0.89.
- Semantic Evasion Detection — 182 attack pattern embeddings across 28 categories catch obfuscated and novel attacks that keyword filters miss.
- Input Deobfuscation — Decodes base64, hex, URL encoding, Unicode homoglyphs, l33tspeak, zero-width characters, and whitespace insertion before analysis.
- Egress Monitoring — Scans every LLM response for leaked secrets, PII, and system prompt disclosure. Secrets are redacted automatically.
- Session Tracking — Cumulative threat scoring with 30-minute half-life decay detects slow-burn attacks spread across multiple messages.
- Secret Masking — API keys, AWS credentials, GitHub tokens, and JWTs are auto-redacted in logs. Every admin action is recorded in an immutable audit trail.
- OAI-Compatible Proxy — Drop-in replacement for any OpenAI base URL. Authenticate with API keys, and PooGuard transparently analyzes, blocks, or forwards every request.
- Real-Time Dashboard — Live WebSocket feed with per-category threat scores, timeline charts, analytics, and hourly distribution views.
- Configurable Presets — Three calibrated profiles: High Security, Balanced (default), and Low Friction. Or set custom thresholds per category.
- Alert System — Six alert types (threshold, rate, session_threat, access_pattern, config_change, repeat_block) with real-time notifications.
Note
Requires an NVIDIA GPU with 16 GB+ VRAM (RTX 4080 or better). First run downloads the ~13 GB model — cached in a Docker volume for subsequent starts.
git clone https://github.com/tacos8me/PooGuard.git
cd PooGuard
cp .env.example .env
# Edit .env — set at minimum: HF_TOKEN, JWT_SECRET, DB_PASSWORD
docker compose up| Service | URL |
|---|---|
| Dashboard | http://localhost:3000 |
| API | http://localhost:3001 |
| Proxy | http://localhost:3001/v1 |
Default login: admin@pooguard.local with a randomly generated password (printed to console on first seed, or set ADMIN_PASSWORD env var).
- Docker and Docker Compose v2 (install guide)
- NVIDIA Container Toolkit for GPU passthrough (install guide)
- HuggingFace account — accept the model license before first run
Request ➜ Auth ➜ Extract ➜ Normalize ➜ Classify ➜ Evaluate ➜ Forward ➜ Upstream LLM
| |
Block Response
| |
▼ ▼
Client ◀── Egress Scan
Detailed pipeline steps
- Authentication —
/v1/chat/completionsaccepts JWT tokens or PooGuard API keys (sk-pg-*). Credentials validated via SHA-256 hash lookup. - Rate Limiting — User-based tiered limits (admin: 100/min, API key: 60/min, viewer: 30/min, anonymous: 15/min) using Redis-backed sliding windows.
- Text Extraction — User messages extracted from the OpenAI-format
messagesarray, including multi-part content. - Input Normalization — Multi-layer deobfuscation: invisible Unicode stripping, NFKC normalization, homoglyph replacement, whitespace collapse, iterative decoding (base64, hex, URL, l33t, ROT13).
- Threat Classification — Safeguard model runs inference on normalized text, returning per-category scores.
- Semantic Similarity — Input embedding compared against 182 attack pattern embeddings across 28 categories.
- Threshold Evaluation — Calibrated scores compared against configurable thresholds. Each category independently triggers block, flag, or allow.
- Forward or Block — Safe requests forwarded to upstream LLM. Both streaming (SSE) and non-streaming supported.
- Egress Monitoring — Response body scanned for leaked secrets, PII, and sensitive data before delivery.
- Event Broadcast — Logged to PostgreSQL, published to Redis, dashboard updated via WebSocket in real time.
PooGuard exposes an OpenAI-compatible proxy at /v1. Any client that supports a custom base URL works out of the box.
Create an API key: Log in to the dashboard, go to Settings > API Keys, and create a key (sk-pg-<hex>). Copy it immediately — shown only once.
# curl
curl http://localhost:3001/v1/chat/completions \
-H "Authorization: Bearer sk-pg-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "any-model-name",
"messages": [{"role": "user", "content": "Hello!"}]
}'# Any OAI-compatible client
export OPENAI_BASE_URL=http://your-server:3001/v1
export OPENAI_API_KEY=sk-pg-YOUR_KEYSillyTavern / Open WebUI setup
SillyTavern:
- Open Settings > API Connections
- Set API type to Chat Completion (OpenAI)
- Set base URL to
http://your-server:3001/v1 - Paste your
sk-pg-API key - Pick any model — PooGuard forwards to your upstream
Open WebUI:
- Go to Settings > Connections
- Add an OpenAI-compatible connection
- Set URL to
http://your-server:3001/v1 - Paste your API key
| Variable | Required | Default | Description |
|---|---|---|---|
HF_TOKEN |
Yes | — | HuggingFace token for model download |
JWT_SECRET |
Yes | — | Token signing key (min 32 chars) |
DB_PASSWORD |
Yes | — | PostgreSQL password |
REDIS_PASSWORD |
No | redis-dev-password |
Redis authentication password |
MODEL_NAME |
No | openai/gpt-oss-safeguard-20b |
HuggingFace model name |
SAFEGUARD_MODEL_SIZE |
No | 20b |
Model variant: 20b or 120b |
MODEL_SERVICE_API_KEY |
No | — | Backend-to-model-service auth key |
ALLOWED_ORIGINS |
No | localhost:3000,5173 |
CORS allowed origins |
Tune detection sensitivity from the dashboard Settings page:
| Preset | Prompt Injection | Jailbreak | PII | Semantic | Use Case |
|---|---|---|---|---|---|
| High Security | 0.40 | 0.40 | 0.50 | 0.28 | Maximize detection, accept more false positives |
| Balanced | 0.70 | 0.70 | 0.70 | 0.42 | Best F1 score (default) |
| Low Friction | 0.90 | 0.90 | 0.90 | 0.50 | Minimize false positives |
Tip
The model produces bimodal scores (near 0.0 or 0.8–0.95), so small threshold changes in the middle range have little practical effect. Lower thresholds = more aggressive blocking.
cd backend && npx jest --no-coverage # 449 tests
cd model-service && python -m pytest tests/ -v # 216 tests
cd frontend && npx vitest run # 31 tests| Suite | Framework | Tests | Coverage |
|---|---|---|---|
| Backend | Jest + Supertest | 449 | Routes, middleware, services, utilities |
| Model Service | pytest | 216 | Inference, normalization, semantic similarity, API |
| Frontend | Vitest | 31 | Components, auth flows, settings |
Benchmarked on RTX 5090 (32 GB) with the 20B model in MXFP4 quantization, 360-example dataset:
| Metric | Latency |
|---|---|
| Mean | 2.6s |
| Median | 2.8s |
| P95 | 3.6s |
| Clean inputs (avg) | 1.9s |
| Threat inputs (avg) | 3.1s |
Early-exit stopping cuts clean-input latency nearly in half — most production traffic is clean. Streaming proxy requests begin forwarding immediately; analysis runs in parallel.
Running with Docker (recommended)
docker compose -f docker-compose.yml -f docker-compose.dev.yml upMounts source directories for hot reload — backend with nodemon, frontend with Vite HMR, model service with live main.py mounting. Dev mode also exposes PostgreSQL (5432), Redis (6379), and model-service (8000) for direct access.
Running services individually
# Backend
cd backend && npm install && npm run migrate && npm run dev
# Frontend
cd frontend && npm install && npm run dev
# Model service (requires GPU)
cd model-service && pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000Tech stack
| Layer | Technology | Purpose |
|---|---|---|
| ML Inference | PyTorch (CUDA 12.8), Transformers | GPU model inference with MXFP4 quantization |
| Embeddings | sentence-transformers | Semantic similarity attack detection |
| Model API | Python 3.12, FastAPI, Uvicorn | Threat classification service |
| Backend | Node.js, Express 4 | REST API, WebSocket, LLM proxy |
| Auth | jsonwebtoken, bcrypt | JWT + SHA-256 hashed API keys |
| Database | PostgreSQL 16, Knex.js | Request logs, config, audit trail, alerts |
| Cache / PubSub | Redis 7 | Rate limiting, cache, real-time event bus |
| Real-time | Socket.IO 4 | WebSocket events to dashboard |
| Frontend | React 18, Vite 5, TailwindCSS 3 | Monitoring dashboard |
| Charts | Recharts 2 | Analytics visualizations |
| Security | Helmet, CORS, CSRF | HTTP hardening |
| Infrastructure | Docker Compose | Multi-service orchestration with GPU passthrough |
Contributions welcome. Please open an issue first to discuss what you'd like to change.
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-change) - Run the test suites before submitting
- Open a pull request
