On-Premise AI Infrastructure for Intel Arc GPUs
NetAI Stack SE is built for organizations that cannot afford cloud data leakage. Law firms, SMEs, and compliance-driven businesses run this stack entirely on-premise:
- No cloud inference: All LLM queries execute locally on your Intel Arc Pro B50.
- No telemetry: No data leaves your network unless you explicitly configure external integrations.
- GDPR-ready: Patient/client data, legal documents, and internal knowledge bases remain under your physical control. Enhanced with Microsoft Presidio for automatic PII redaction (German + English).
- EU AI Act compliant: Full transparency documentation and prompt injection protection via Mezzo-Prompt-Guard-v2-Base on isolated iGPU.
- Dual-GPU Partitioning: Compliance services (PII-Guard, Security-Guard) offloaded to Alder Lake iGPU to preserve Battlemage dGPU VRAM for the main LLM.
The stack includes automated PII detection and redaction:
- PII-Guard Service: Intercepts all web search queries before they reach SearXNG.
- Microsoft Presidio: Detects names, locations, IBANs, phone numbers, emails, and more. Optimized for German and English.
- Endpoints:
/redact(standalone redaction),/search(redact + forward to SearXNG),/search/redacted(debug/compliance view). - Automatic Redaction: Replaces PII with generic placeholders (e.g.,
[REDACTED_NAME],[REDACTED_LOCATION]). - Data Minimization: GDPR Article 5 compliance headers on proxied responses (
X-PIGuard-Redacted,X-PIGuard-Compliance).
The stack includes AI safety measures for professional use:
- Security-Guard Service: Filters all inference requests for prompt injection before they reach the Cascade LLM.
- Mezzo-Prompt-Guard-v2-Base (IQ4_XS): Highly specialized safety model (~450MB VRAM) on the integrated iGPU.
- Fallback: Heuristic regex-based classifier when llama-server is unavailable.
- Prompt Injection Detection: Blocks jailbreak attempts, system prompt extraction, malicious code.
- Human-in-the-Loop: All blocked requests are logged for review with transparency metadata.
User → Caddy (:443) → LibreChat (:3080/chat) → Security-Guard (:8778) → Cascade LLM (:3000) ─┬─ small (low complexity) → Auxiliary LFM2.5-VL-1.6B (:8082)
├─ large (high complexity) → Inference Qwen3.6-35B (:8080)
└─ confidence < 0.7 → reroute to Inference (fallback)
→ SearXNG (:8080) via PII-Guard (:8777)
→ Hermes Agent (:9119 dashboard, :8642 API) ───→ Cascade LLM (:3000) [direct, bypasses Security-Guard]
→ Hermes WebUI (:8787)
→ Beszel (:8090 monitoring)
→ SuperTonic TTS (:8800)
→ Parakeet STT (:5092)
→ Auth-Validator (:8081) → LibreChat (:3080) [forward_auth backend]
| Service | Container | Port(s) | Purpose |
|---|---|---|---|
| Inference | netai-inference |
8080 | llama.cpp with Intel SYCL, hosts Qwen3.6-35B (large model) |
| Auxiliary | netai-auxiliary |
8082 | llama.cpp with Intel SYCL, hosts LFM2.5-VL-1.6B (small multimodal model) |
| Cascade LLM | netai-cascade-llm |
3000 | Routes requests by complexity + confidence between small/large models |
| Frontend | netai-librechat |
3080 | LibreChat (chat interface, RAG, multi-model, MCP) |
| Agent | netai-hermes |
8642, 9119 | Hermes Agent (Telegram bot + dashboard) |
| Hermes WebUI | netai-hermes-webui |
8787 | Full web interface for Hermes |
| Reverse Proxy | netai-caddy |
80, 443 | TLS termination, path-based routing, URL rewriting (replace-response module) |
| Web Search | netai-searxng |
8080 | Privacy-respecting meta search engine |
| PII-Guard | netai-pii-guard |
8777 | GDPR PII redaction proxy between LibreChat and SearXNG |
| Security-Guard | netai-security-guard |
8778 | Mezzo Prompt Guard (+ internal llama-server :8779) on iGPU |
| TTS | netai-tts |
8800 | SuperTonic TTS (OpenAI-compatible) |
| STT | netai-stt |
5092 | Parakeet TDT (OpenAI-compatible) |
| Knowledge Graph | netai-lightrag |
8020 | LightRAG — graph-enhanced RAG with entity extraction, hybrid search (local/global/naive) |
| Monitoring | netai-beszel |
8090 | Lightweight system monitoring (CPU, RAM, disk, Docker, GPU) |
| Auth-Validator | netai-auth-validator |
8081 | Validates LibreChat sessions for Caddy forward_auth |
The Cascade LLM (netai-cascade-llm, port 3000) is the central routing layer between all clients (LibreChat via Security-Guard, Hermes Agent directly) and the two inference backends.
Request → Cascade LLM
│
├─ has_image?
│ ├─ Yes + complexity > 0.5 → Large Multimodal (Qwen3.6-35B)
│ └─ Yes + complexity ≤ 0.5 → Small Multimodal (LFM2.5-VL-1.6B)
│
└─ text-only?
├─ complexity > 0.5 → Large Text (Qwen3.6-35B)
└─ complexity ≤ 0.5 → Small Model + confidence check
│
├─ confidence ≥ 0.7 → keep small response
└─ confidence < 0.7 → reroute to Large Text
evaluate_complexity() computes a score from 0.0–1.0:
| Factor | Weight | Details |
|---|---|---|
| Message length | 50% | min(chars / 1000, 1.0) |
| Keywords | 50% | analyze deeply, write code, expert, reasoning, logic, complex (+0.2 each, capped at 1.0) |
When the small model is selected for a non-streaming request:
- Request is forwarded with
logprobs: trueenabled - Gateway receives the full response and extracts token-level log probabilities
confidence = exp(mean(token_logprobs))- If confidence ≥
CONFIDENCE_THRESHOLD(default 0.7): small model response is returned immediately with anx-confidenceheader - If confidence < 0.7: small model response is discarded and the original request is rerouted to the large model
This catches cases where the small model is uncertain — ambiguous queries, domain-specific questions, or edge cases the complexity heuristic misjudged.
| Env Var | Default | Purpose |
|---|---|---|
SMALL_MLLM_URL |
http://netai-auxiliary:8080/v1/chat/completions |
Small multimodal model endpoint |
LARGE_MLLM_URL |
http://netai-inference:8080/v1/chat/completions |
Large multimodal model endpoint |
LARGE_TEXT_URL |
http://netai-inference:8080/v1/chat/completions |
Large text-only model endpoint |
ROUTER_THRESHOLD |
0.5 |
Complexity cutoff (0.0–1.0) |
CONFIDENCE_THRESHOLD |
0.7 |
Minimum confidence to keep small model response (0.0–1.0) |
LARGE_MODEL_MULTIMODAL |
true |
Whether the large model supports images |
- LibreChat sends user prompt to Security-Guard.
- Security-Guard classifies prompt via Mezzo-Prompt-Guard (running on iGPU).
- SAFE: Prompt forwarded to Cascade LLM.
- Cascade LLM evaluates complexity → routes to small or large model.
- If small model: confidence check via logprobs → may reroute to large.
- Response returned through Security-Guard → LibreChat → User.
- UNSAFE: Blocked with 403 + audit log.
- Hermes Agent sends directly to Cascade LLM (avoids duplicating safety checks).
- Gateway routes by complexity/confidence as above.
- Hermes uses LFM2.5-VL-1.6B on auxiliary for context compression.
- Web search goes through MCP server → PII-Guard → SearXNG.
LibreChat → PII-Guard (:8777) → SearXNG (:8080)
↓
Presidio Analyzer (spaCy EN+DE, custom regex)
↓
Anonymized query + compliance headers
Hermes Agent uses an MCP (Model Context Protocol) server for web search functionality:
Hermes Agent → MCP Server (stdio) → PII-Guard (:8777) → SearXNG (:8080)
- MCP Server:
config/hermes-agent/mcp-server-search.py(Python stdlib, JSON-RPC 2.0) - Tool:
web_search(query, limit)— routes through PII-Guard for GDPR compliance - Registered in
config/hermes-agent/config.yamlundermcp_servers.netai-search - Chosen because Hermes' built-in web search only works with supported LLM API providers (OpenAI, Anthropic, etc.)
- Ubuntu 24.04 LTS (kernel 6.8+)
- Intel Arc Pro B50 (Battlemage) GPU
- Model files in
models/Qwen3.6/:Qwen3.6-35B-A3B-UD-IQ2_M.ggufmmproj-F16.gguf
- Model file in
models/LFM2.5/:LFM2.5-VL-1.6B-UD-IQ4_XS.gguf
- Model file in
models/Mezzo-Prompt_guard-v2-Base/:Mezzo-Prompt-Guard-v2-Base.IQ4_XS.gguf
- Model file in
models/Parakeet/(auto-downloaded on startup)
cp .env.example .env
# Edit .env and set:
# DOMAIN, ADMIN_EMAIL, ADMIN_PASSWORD, TELEGRAM_TOKEN (optional)
# SSL_CERT_PATH and SSL_KEY_PATH (Let's Encrypt paths)
# SEARXNG_SECRET (generate with: openssl rand -hex 32)
# LIBRECHAT_JWT_SECRET and JWT_REFRESH_SECRET (generate with: openssl rand -hex 32)
# BESZEL_KEY and BESZEL_TOKEN (generate with: openssl rand -hex 32)./setup.shThis will:
- Update system packages
- Install Intel GPU drivers (
intel-opencl-icd,intel-level-zero-gpu,level-zero) - Install Docker & Docker Compose if missing
- Add your user to
renderandvideogroups - Auto-detect Intel GPU DRI devices and write them to
.env - Validate that model files are present
- Validate SSL certificates
Log out and back in after setup completes so group membership takes effect.
docker compose build auth-validator caddy security-guard pii-guard speech-sttdocker compose up -dbash scripts/setup-beszel.shThis auto-creates the admin account using ADMIN_EMAIL / ADMIN_PASSWORD from .env.
Open your browser to https://<your-domain> and register the first user. The first registered user becomes the admin (requires ALLOW_REGISTRATION=true and ALLOW_UNVERIFIED_EMAIL_LOGIN=true).
| Endpoint | Description |
|---|---|
https://<your-domain>/ |
LibreChat |
https://<your-domain>/agent/ |
Hermes Agent Dashboard |
https://<your-domain>/agent-api/ |
Hermes Agent API (OpenAI-compatible) |
https://<your-domain>/hermes-webui/ |
Hermes Web UI (full web interface) |
https://<your-domain>/search/ |
SearXNG (Web Search) |
https://<your-domain>/tts/ |
SuperTonic TTS API (OpenAI-compatible) |
https://<your-domain>/speech-stt/ |
Parakeet STT API (OpenAI-compatible) |
https://<your-domain>/beszel/ |
Beszel Monitoring Dashboard |
https://<your-domain>/inference/ |
llama.cpp Inference API |
https://<your-domain>/pii-guard/ |
PII-Guard API |
https://<your-domain>/security-guard/ |
Security-Guard API |
https://<your-domain>/knowledge/ |
LightRAG Knowledge Graph API (query, upload documents) |
https://<your-domain>/lanshare/ |
LAN Share files served via LibreChat |
\\<HOST>\netai-lanshare |
SMB/CIFS LAN share (Windows file explorer) |
Note: HTTP (port 80) automatically redirects to HTTPS (port 443).
The stack includes an SMB file share for exchanging documents across your local network. It is integrated with LibreChat for AI-assisted document processing.
Access via Windows File Explorer:
\\<HOST-IP-ADDRESS>\netai-lanshare
Access via Linux/macOS:
smbclient //<HOST-IP-ADDRESS>/netai-lanshare -U netaiAccess via Browser:
https://<DOMAIN>/lanshare/
Credentials: User netai, password from LANSHARE_PASSWORD in .env.
LibreChat Integration: Files placed in the LAN share are automatically available in LibreChat at /lanshare/. You can reference them in chat prompts and LibreChat will include their content as context. PDFs, images, and text documents placed here can be used for RAG-style queries.
Backup Hint: This directory is a host bind mount at ./data/netai-lanshare/. Include it in your regular backup routine.
# Fast smoke test (~30-40s) — containers, network, auth, data stores, backup
pytest tests/ -m "not slow" -v
# Full suite including chat completion (~2-3 min)
pytest tests/ -v
# Compliance-specific tests
pytest tests/ -m "pii_guard" -v
pytest tests/ -m "security_guard" -v
# Cascade LLM routing tests
pytest tests/test_llm_gateway.py -v| Path | Upstream | Notes |
|---|---|---|
/ |
librechat:3080 |
LibreChat |
/search/ |
searxng:8080 |
With HTML URL rewriting via replace |
/agent/ |
agent-hermes:9119 |
Dashboard, HTML/JS/CSS rewriting, auth via LibreChat |
/agent-api/ |
agent-hermes:8642 |
Gateway API, no auth |
/hermes-webui/ |
hermes-webui:8787 |
HTML rewriting, auth via LibreChat |
/beszel/ |
netai-beszel:8090 |
Beszel dashboard (WebSocket for real-time), auth via LibreChat |
/tts/ |
supertonic-tts:8800 |
SuperTonic TTS API |
/inference/ |
netai-inference:8080 |
llama.cpp API |
/pii-guard/ |
pii-guard:8777 |
PII-Guard API |
/security-guard/ |
security-guard:8778 |
Security-Guard API |
/speech-stt/ |
speech-stt:5092 |
Parakeet STT API |
/knowledge/ |
lightrag:8020 |
LightRAG Knowledge Graph API (query, document upload) |
All internal services are reachable via Docker network (netai-stack-se). The Caddy reverse proxy makes selected endpoints available externally on port 443 (HTTPS). Unless noted, endpoints require no authentication when accessed from the internal Docker network; external routes via Caddy may require a valid LibreChat JWT session.
| Service | Internal URL | External Path | Protocol | Auth | Purpose |
|---|---|---|---|---|---|
| LibreChat | http://librechat:3080 |
/ |
OpenAI-compatible REST | JWT Bearer | Chat frontend, conversation management, preset config |
| Cascade LLM | http://netai-cascade-llm:3000 |
— (internal only) | OpenAI-compatible REST | None | Complexity + confidence routing between small and large LLM |
| Security-Guard | http://security-guard:8778 |
/security-guard/ |
OpenAI-compatible REST | None (external via Caddy) | Prompt injection detection, transparent proxy to Cascade LLM |
| PII-Guard | http://pii-guard:8777 |
/pii-guard/ |
REST | None | PII redaction, SearXNG proxy with GDPR compliance |
| Inference (raw) | http://inference-server:8080 |
/inference/ |
OpenAI-compatible REST | None | Raw llama.cpp (Qwen3.6-35B) — no guard, no routing |
| Auxiliary (raw) | http://auxiliary-server:8080 |
— (internal only) | OpenAI-compatible REST | None | Raw llama.cpp (LFM2.5-VL-1.6B) — small model backend |
| Hermes Agent | http://agent-hermes:8642 |
/agent-api/ |
OpenAI-compatible REST | LibreChat JWT | Hermes Agent gateway API (tool calling, code execution) |
| SearXNG | http://searxng:8080 |
/search/ |
REST + Web UI | None | Privacy-respecting meta search engine |
| TTS | http://supertonic-tts:8800 |
/tts/ |
OpenAI-compatible REST | None | Text-to-Speech (SuperTonic) |
| STT | http://speech-stt:5092 |
/speech-stt/ |
OpenAI-compatible REST | None | Speech-to-Text (Parakeet TDT) |
| Beszel | http://beszel:8090 |
/beszel/ |
REST + WebSocket | LibreChat JWT | System monitoring dashboard |
| Auth-Validator | http://auth-validator:8081 |
— (internal only) | REST | JWT Bearer | Caddy forward_auth backend — validates LibreChat sessions |
| LightRAG | http://lightrag:8020 |
/knowledge/ |
REST (OpenAI-compatible) | API Key (optional) | Graph-enhanced RAG — entity extraction, hybrid search, document ingestion |
| MongoDB | mongodb://mongo:27017 |
— (internal only) | MongoDB Wire | None | LibreChat database (internal Docker network only) |
| LAN Share | smb://<HOST>:445/netai-lanshare |
\\<HOST>\netai-lanshare |
SMB/CIFS | SMB user netai |
File share, mounted in LibreChat at /lanshare/ |
| Caddy | caddy:443 |
https://<DOMAIN>/ |
HTTP/HTTPS | LibreChat JWT (selected routes) | TLS termination, path-based reverse proxy |
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/api/auth/login |
None | Login with email + password, returns JWT |
POST |
/api/auth/refresh |
JWT | Refresh expired JWT token |
POST |
/api/auth/logout |
JWT | Invalidate current session |
GET |
/api/auth/user |
JWT | Get current user profile |
GET |
/api/config |
JWT | Get client configuration (includes modelSpecs) |
GET |
/api/models |
JWT | List available models |
POST |
/api/chat/completions |
JWT | Chat completions (OpenAI-compatible format) |
POST |
/api/chat/stream |
JWT | Streaming chat completions (SSE) |
GET |
/api/convos |
JWT | List conversations |
GET |
/api/convos/:id |
JWT | Get single conversation |
DELETE |
/api/convos/:id |
JWT | Delete conversation |
POST |
/api/convos/clear |
JWT | Clear all conversations |
GET |
/api/presets |
JWT | List model presets |
POST |
/api/presets |
JWT | Create preset |
PUT |
/api/presets/:id |
JWT | Update preset |
DELETE |
/api/presets/:id |
JWT | Delete preset |
POST |
/api/endpoints |
JWT | Register custom endpoints |
POST |
/api/agents |
JWT | Create agent (if agents enabled) |
GET |
/api/agents |
JWT | List agents |
GET |
/api/agents/:id |
JWT | Get agent details |
POST |
/api/agents/:id/completions |
JWT | Agent-based chat completion |
POST |
/api/files/upload |
JWT | Upload file for RAG context |
GET |
/api/files/:id |
JWT | Get uploaded file |
POST |
/api/tags |
JWT | Create tag |
GET |
/api/tags |
JWT | List tags |
Auth: Include Authorization: Bearer <JWT_TOKEN> header. Token obtained from POST /api/auth/login.
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/v1/chat/completions |
None | Transparent proxy — classifies prompt via Mezzo, forwards to Cascade LLM if safe (supports streaming SSE) |
GET |
/v1/models |
None | Passthrough model discovery |
POST |
/classify |
None | Standalone prompt classification — returns {"safe": true/false, "category": "...", "score": N} |
POST |
/filter |
None | Block unsafe prompts (403), forward safe ones to inference |
POST |
/inference |
None | Legacy guarded completion endpoint |
GET |
/health |
None | Health check |
Response headers: X-MeZzo-Safe, X-MeZzo-Category, X-MeZzo-Score
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/redact |
None | Standalone PII redaction — send JSON {"text": "..."}, receive redacted text |
POST |
/search |
None | Proxy — redacts query, forwards to SearXNG, returns JSON results |
GET |
/search/redacted |
None | Debug/compliance — shows what would be redacted without forwarding |
GET |
/health |
None | Health check |
Response headers: X-PIGuard-Redacted (count), X-PIGuard-Compliance (GDPR article reference)
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/v1/chat/completions |
None | OpenAI-compatible — evaluates complexity, routes to small/large model. Supports logprobs: true for confidence-based fallback. Non-streaming only: reroutes to large model when confidence < threshold. |
GET |
/health |
None | Health check |
Request body: Standard OpenAI chat completions format. Add "image_url" content parts for multimodal routing.
Response headers (when small model kept): x-confidence (float 0.0–1.0)
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/v1/chat/completions |
None | Raw llama.cpp completions (OpenAI-compatible). No prompt injection check, no confidence routing. |
GET |
/v1/models |
None | List loaded models |
GET |
/health |
None | Health check |
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/v1/audio/speech |
None | Text-to-Speech. Accepts {"model": "tts-1", "input": "...", "voice": "nova", "response_format": "wav"} |
GET |
/health |
None | Health check |
Voices: alloy, echo, fable, onyx, nova, shimmer, ash, ballad, coral, marin, cedar, sage, verse
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/v1/audio/transcriptions |
None | Speech-to-Text. Accepts multipart form with audio file. Returns {"text": "..."} |
GET |
/health |
None | Health check |
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/v1/chat/completions |
LibreChat JWT | Chat with Hermes Agent — supports tool calling, MCP tools, code execution |
GET |
/health |
None | Health check |
External path: https://<DOMAIN>/agent-api/v1/chat/completions
Note: External access requires a valid LibreChat session (auth-validator validates the JWT before proxying).
| Method | Path | Auth | Description |
|---|---|---|---|
GET |
/ |
None | Web UI (HTML) |
POST |
/search |
None | JSON search API — accepts {"q": "...", "format": "json", "language": "de-DE"} |
GET |
/health |
None | Health check |
External path: https://<DOMAIN>/search/ (via PII-Guard for GDPR compliance from LibreChat)
| Method | Path | Auth | Description |
|---|---|---|---|
GET |
/ |
JWT Bearer | Validates LibreChat JWT. Returns 200 if valid, 401 if invalid/expired. Used by Caddy's forward_auth directive. |
| Protocol | Path | Auth | Description |
|---|---|---|---|
| SMB/CIFS | \\<HOST>\netai-lanshare |
SMB user netai |
Read/write file share. Also mounted in LibreChat at /lanshare/ for browser access. |
| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/documents/upload |
API Key | Upload document — LightRAG extracts entities + relationships, builds knowledge graph |
POST |
/query |
API Key | Query the knowledge graph. Modes: naive (vector-only), local (entity-level), global (community-level), hybrid (all) |
GET |
/graph |
API Key | Retrieve the knowledge graph structure (entities + relationships) |
GET |
/health |
None | Health check |
POST |
/v1/chat/completions |
API Key | OpenAI-compatible endpoint — accepts standard chat format, returns RAG-augmented responses |
Auth: Optional API key via Authorization: Bearer <key> header. Default: sk-no-key-required.
Document ingestion: Use the provided script to bulk-import documents:
# Ingest all files from the LAN share
./scripts/ingest-docs.sh
# Ingest a single document
./scripts/ingest-docs.sh /path/to/document.pdf
# Watch the LAN share for new files (requires inotify-tools)
sudo apt install inotify-tools
./scripts/ingest-docs.sh --watchPOST /api/auth/loginwith{"email": "...", "password": "..."}→ returnstoken(JWT) andrefreshToken- Include
Authorization: Bearer <token>in subsequent requests - When the token expires,
POST /api/auth/refreshwith{"token": "<refreshToken>"}→ returns new token pair - For external routes requiring auth (marked "LibreChat JWT") Caddy's
forward_authmiddleware automatically validates the JWT againstauth-validator:8081
- This stack uses the SYCL backend via the official
ghcr.io/ggml-org/llama.cpp:server-intel-b9641image. - Ensure your kernel is 6.8 or newer for native Xe/i915 support on Battlemage.
- The environment variable
ONEAPI_DEVICE_SELECTOR=*:gpuis passed to the inference container. - Important: llama.cpp SYCL backend only supports discrete Intel Arc GPUs (Xe-HPG+, like Arc Pro B50). Integrated Xe-LP GPUs (UHD 770) are not enumerated by the SYCL backend and cannot be used for inference. The
inference-servercontainer is strictly bound to the discrete Arc GPU. - The
auxiliary-servercontainer also uses SYCL and shares the same dGPU for the small multimodal model (LFM2.5-VL-1.6B). - Security-Guard uses the iGPU (
/dev/dri/card0) separately for Mezzo-Prompt-Guard inference. - There are no NVIDIA/CUDA dependencies in this stack.
Caddy handles TLS termination. For production, configure Let's Encrypt certificates. For development, self-signed certificates can be used:
# Place certificates at paths specified in .env
# Caddy will use them for TLSA pytest-based integration test suite validates containers, network paths, APIs, authentication, chat completions, data stores, and backups.
Prerequisites: pytest and requests must be installed (pip install pytest requests).
# Fast smoke test (containers, network, auth, data stores, backup — ~30-40s)
pytest tests/ -m "not slow" -v
# Full suite including chat completion (~2-3 min)
pytest tests/ -v
# Parallel execution (requires pytest-xdist)
pytest tests/ -m "not slow" -v -n auto
# Compliance-specific
pytest tests/ -m "pii_guard" -v
pytest tests/ -m "security_guard" -v
# Cascade LLM
pytest tests/test_llm_gateway.py -v| Test File | Coverage |
|---|---|
test_containers.py |
Docker container status and health checks |
test_network_paths.py |
Caddy routing and inter-service DNS |
test_api_endpoints.py |
LLM inference, LibreChat, Hermes API, SearXNG |
test_authentication.py |
LibreChat login and token validation |
test_chat_completion.py |
End-to-end chat via LibreChat API |
test_data_stores.py |
SQLite integrity, ChromaDB, uploads |
test_backup.py |
Backup script execution and archive validation |
test_pii_guard.py |
GDPR compliance, PII redaction, search proxy |
test_security_guard.py |
Prompt injection detection, filtering, Article 52 |
test_beszel.py |
Beszel monitoring metrics |
test_llm_gateway.py |
Cascade LLM routing, complexity scoring, confidence fallback, streaming |
test_hermes_webui.py |
Hermes compose config, Caddy proxy, .env.example |
test_hermes_playwright.py |
Hermes API server and LibreChat browser tests |
LibreChat stores all data in MongoDB (persistent volume mongo-data).
docker compose exec mongo mongodump --archive=/backups/librechat-$(date +%Y%m%d).archivedocker compose exec -T mongo mongorestore --archive=< backup-file.archivelspci | grep -i vga | grep -i intelIf empty, verify the GPU is seated and the kernel module is loaded:
sudo dmesg | grep i915
sudo intel_gpu_topEnsure your user is in the video and render groups, then log out and back in:
sudo usermod -aG video,render $USERsetup.sh will warn you if the expected GGUF files are absent. Download them via Hugging Face CLI or wget and place them in their respective directories under models/.
Verify the Mezzo-Prompt-Guard model file is present at the path specified by SECURITY_GUARD_MODEL_PATH in .env or the default path in docker-compose.yml.
This deployment configuration is provided as-is for B2B on-premise deployments. Model weights and upstream container images are subject to their respective licenses.