Skip to content

netai369/NetAI-Stack-SE

Repository files navigation

NetAI Stack SE

On-Premise AI Infrastructure for Intel Arc GPUs

Data Sovereignty — Your Data Stays Yours

NetAI Stack SE is built for organizations that cannot afford cloud data leakage. Law firms, SMEs, and compliance-driven businesses run this stack entirely on-premise:

  • No cloud inference: All LLM queries execute locally on your Intel Arc Pro B50.
  • No telemetry: No data leaves your network unless you explicitly configure external integrations.
  • GDPR-ready: Patient/client data, legal documents, and internal knowledge bases remain under your physical control. Enhanced with Microsoft Presidio for automatic PII redaction (German + English).
  • EU AI Act compliant: Full transparency documentation and prompt injection protection via Mezzo-Prompt-Guard-v2-Base on isolated iGPU.
  • Dual-GPU Partitioning: Compliance services (PII-Guard, Security-Guard) offloaded to Alder Lake iGPU to preserve Battlemage dGPU VRAM for the main LLM.

Compliance Features

GDPR Compliance (DSGVO)

The stack includes automated PII detection and redaction:

  • PII-Guard Service: Intercepts all web search queries before they reach SearXNG.
  • Microsoft Presidio: Detects names, locations, IBANs, phone numbers, emails, and more. Optimized for German and English.
  • Endpoints: /redact (standalone redaction), /search (redact + forward to SearXNG), /search/redacted (debug/compliance view).
  • Automatic Redaction: Replaces PII with generic placeholders (e.g., [REDACTED_NAME], [REDACTED_LOCATION]).
  • Data Minimization: GDPR Article 5 compliance headers on proxied responses (X-PIGuard-Redacted, X-PIGuard-Compliance).

EU AI Act Compliance (Article 52)

The stack includes AI safety measures for professional use:

  • Security-Guard Service: Filters all inference requests for prompt injection before they reach the Cascade LLM.
  • Mezzo-Prompt-Guard-v2-Base (IQ4_XS): Highly specialized safety model (~450MB VRAM) on the integrated iGPU.
  • Fallback: Heuristic regex-based classifier when llama-server is unavailable.
  • Prompt Injection Detection: Blocks jailbreak attempts, system prompt extraction, malicious code.
  • Human-in-the-Loop: All blocked requests are logged for review with transparency metadata.

Architecture

User → Caddy (:443) → LibreChat (:3080/chat) → Security-Guard (:8778) → Cascade LLM (:3000) ─┬─ small (low complexity) → Auxiliary LFM2.5-VL-1.6B (:8082)
                                                                                         ├─ large (high complexity) → Inference Qwen3.6-35B (:8080)
                                                                                         └─ confidence < 0.7        → reroute to Inference (fallback)
                   → SearXNG (:8080) via PII-Guard (:8777)
                   → Hermes Agent (:9119 dashboard, :8642 API) ───→ Cascade LLM (:3000) [direct, bypasses Security-Guard]
                   → Hermes WebUI (:8787)
                   → Beszel (:8090 monitoring)
                   → SuperTonic TTS (:8800)
                   → Parakeet STT (:5092)
                   → Auth-Validator (:8081) → LibreChat (:3080) [forward_auth backend]

Core Services

Service Container Port(s) Purpose
Inference netai-inference 8080 llama.cpp with Intel SYCL, hosts Qwen3.6-35B (large model)
Auxiliary netai-auxiliary 8082 llama.cpp with Intel SYCL, hosts LFM2.5-VL-1.6B (small multimodal model)
Cascade LLM netai-cascade-llm 3000 Routes requests by complexity + confidence between small/large models
Frontend netai-librechat 3080 LibreChat (chat interface, RAG, multi-model, MCP)
Agent netai-hermes 8642, 9119 Hermes Agent (Telegram bot + dashboard)
Hermes WebUI netai-hermes-webui 8787 Full web interface for Hermes
Reverse Proxy netai-caddy 80, 443 TLS termination, path-based routing, URL rewriting (replace-response module)
Web Search netai-searxng 8080 Privacy-respecting meta search engine
PII-Guard netai-pii-guard 8777 GDPR PII redaction proxy between LibreChat and SearXNG
Security-Guard netai-security-guard 8778 Mezzo Prompt Guard (+ internal llama-server :8779) on iGPU
TTS netai-tts 8800 SuperTonic TTS (OpenAI-compatible)
STT netai-stt 5092 Parakeet TDT (OpenAI-compatible)
Knowledge Graph netai-lightrag 8020 LightRAG — graph-enhanced RAG with entity extraction, hybrid search (local/global/naive)
Monitoring netai-beszel 8090 Lightweight system monitoring (CPU, RAM, disk, Docker, GPU)
Auth-Validator netai-auth-validator 8081 Validates LibreChat sessions for Caddy forward_auth

Cascade LLM — Model Router

The Cascade LLM (netai-cascade-llm, port 3000) is the central routing layer between all clients (LibreChat via Security-Guard, Hermes Agent directly) and the two inference backends.

Routing Architecture

Request → Cascade LLM
            │
            ├─ has_image?
            │   ├─ Yes + complexity > 0.5  → Large Multimodal (Qwen3.6-35B)
            │   └─ Yes + complexity ≤ 0.5  → Small Multimodal (LFM2.5-VL-1.6B)
            │
            └─ text-only?
                ├─ complexity > 0.5  → Large Text (Qwen3.6-35B)
                └─ complexity ≤ 0.5  → Small Model + confidence check
                                        │
                                        ├─ confidence ≥ 0.7 → keep small response
                                        └─ confidence < 0.7 → reroute to Large Text

Step 1: Complexity Scoring

evaluate_complexity() computes a score from 0.0–1.0:

Factor Weight Details
Message length 50% min(chars / 1000, 1.0)
Keywords 50% analyze deeply, write code, expert, reasoning, logic, complex (+0.2 each, capped at 1.0)

Step 2: Confidence-Based Fallback (non-streaming only)

When the small model is selected for a non-streaming request:

  1. Request is forwarded with logprobs: true enabled
  2. Gateway receives the full response and extracts token-level log probabilities
  3. confidence = exp(mean(token_logprobs))
  4. If confidence ≥ CONFIDENCE_THRESHOLD (default 0.7): small model response is returned immediately with an x-confidence header
  5. If confidence < 0.7: small model response is discarded and the original request is rerouted to the large model

This catches cases where the small model is uncertain — ambiguous queries, domain-specific questions, or edge cases the complexity heuristic misjudged.

Configurable Environment Variables

Env Var Default Purpose
SMALL_MLLM_URL http://netai-auxiliary:8080/v1/chat/completions Small multimodal model endpoint
LARGE_MLLM_URL http://netai-inference:8080/v1/chat/completions Large multimodal model endpoint
LARGE_TEXT_URL http://netai-inference:8080/v1/chat/completions Large text-only model endpoint
ROUTER_THRESHOLD 0.5 Complexity cutoff (0.0–1.0)
CONFIDENCE_THRESHOLD 0.7 Minimum confidence to keep small model response (0.0–1.0)
LARGE_MODEL_MULTIMODAL true Whether the large model supports images

Data Flows

LLM Inference (AI Act Protected)

  1. LibreChat sends user prompt to Security-Guard.
  2. Security-Guard classifies prompt via Mezzo-Prompt-Guard (running on iGPU).
  3. SAFE: Prompt forwarded to Cascade LLM.
  4. Cascade LLM evaluates complexity → routes to small or large model.
  5. If small model: confidence check via logprobs → may reroute to large.
  6. Response returned through Security-Guard → LibreChat → User.
  7. UNSAFE: Blocked with 403 + audit log.

Hermes Agent (Bypasses Security-Guard)

  1. Hermes Agent sends directly to Cascade LLM (avoids duplicating safety checks).
  2. Gateway routes by complexity/confidence as above.
  3. Hermes uses LFM2.5-VL-1.6B on auxiliary for context compression.
  4. Web search goes through MCP server → PII-Guard → SearXNG.

Web Search with PII Redaction

LibreChat → PII-Guard (:8777) → SearXNG (:8080)
                ↓
         Presidio Analyzer (spaCy EN+DE, custom regex)
                ↓
         Anonymized query + compliance headers

Hermes Agent Web Search via MCP Server

Hermes Agent uses an MCP (Model Context Protocol) server for web search functionality:

Hermes Agent → MCP Server (stdio) → PII-Guard (:8777) → SearXNG (:8080)
  • MCP Server: config/hermes-agent/mcp-server-search.py (Python stdlib, JSON-RPC 2.0)
  • Tool: web_search(query, limit) — routes through PII-Guard for GDPR compliance
  • Registered in config/hermes-agent/config.yaml under mcp_servers.netai-search
  • Chosen because Hermes' built-in web search only works with supported LLM API providers (OpenAI, Anthropic, etc.)

Quick Start

1. Prerequisites

  • Ubuntu 24.04 LTS (kernel 6.8+)
  • Intel Arc Pro B50 (Battlemage) GPU
  • Model files in models/Qwen3.6/:
    • Qwen3.6-35B-A3B-UD-IQ2_M.gguf
    • mmproj-F16.gguf
  • Model file in models/LFM2.5/:
    • LFM2.5-VL-1.6B-UD-IQ4_XS.gguf
  • Model file in models/Mezzo-Prompt_guard-v2-Base/:
    • Mezzo-Prompt-Guard-v2-Base.IQ4_XS.gguf
  • Model file in models/Parakeet/ (auto-downloaded on startup)

2. Configure Environment

cp .env.example .env
# Edit .env and set:
#   DOMAIN, ADMIN_EMAIL, ADMIN_PASSWORD, TELEGRAM_TOKEN (optional)
#   SSL_CERT_PATH and SSL_KEY_PATH (Let's Encrypt paths)
#   SEARXNG_SECRET (generate with: openssl rand -hex 32)
#   LIBRECHAT_JWT_SECRET and JWT_REFRESH_SECRET (generate with: openssl rand -hex 32)
#   BESZEL_KEY and BESZEL_TOKEN (generate with: openssl rand -hex 32)

3. Run Setup

./setup.sh

This will:

  • Update system packages
  • Install Intel GPU drivers (intel-opencl-icd, intel-level-zero-gpu, level-zero)
  • Install Docker & Docker Compose if missing
  • Add your user to render and video groups
  • Auto-detect Intel GPU DRI devices and write them to .env
  • Validate that model files are present
  • Validate SSL certificates

Log out and back in after setup completes so group membership takes effect.

4. Build Custom Images

docker compose build auth-validator caddy security-guard pii-guard speech-stt

5. Start Services

docker compose up -d

6. Set Up Beszel Monitoring

bash scripts/setup-beszel.sh

This auto-creates the admin account using ADMIN_EMAIL / ADMIN_PASSWORD from .env.

7. Register Admin User

Open your browser to https://<your-domain> and register the first user. The first registered user becomes the admin (requires ALLOW_REGISTRATION=true and ALLOW_UNVERIFIED_EMAIL_LOGIN=true).

8. Access Services

Endpoint Description
https://<your-domain>/ LibreChat
https://<your-domain>/agent/ Hermes Agent Dashboard
https://<your-domain>/agent-api/ Hermes Agent API (OpenAI-compatible)
https://<your-domain>/hermes-webui/ Hermes Web UI (full web interface)
https://<your-domain>/search/ SearXNG (Web Search)
https://<your-domain>/tts/ SuperTonic TTS API (OpenAI-compatible)
https://<your-domain>/speech-stt/ Parakeet STT API (OpenAI-compatible)
https://<your-domain>/beszel/ Beszel Monitoring Dashboard
https://<your-domain>/inference/ llama.cpp Inference API
https://<your-domain>/pii-guard/ PII-Guard API
https://<your-domain>/security-guard/ Security-Guard API
https://<your-domain>/knowledge/ LightRAG Knowledge Graph API (query, upload documents)
https://<your-domain>/lanshare/ LAN Share files served via LibreChat
\\<HOST>\netai-lanshare SMB/CIFS LAN share (Windows file explorer)

Note: HTTP (port 80) automatically redirects to HTTPS (port 443).

LAN Share — SMB/CIFS File Sharing

The stack includes an SMB file share for exchanging documents across your local network. It is integrated with LibreChat for AI-assisted document processing.

Access via Windows File Explorer:

\\<HOST-IP-ADDRESS>\netai-lanshare

Access via Linux/macOS:

smbclient //<HOST-IP-ADDRESS>/netai-lanshare -U netai

Access via Browser:

https://<DOMAIN>/lanshare/

Credentials: User netai, password from LANSHARE_PASSWORD in .env.

LibreChat Integration: Files placed in the LAN share are automatically available in LibreChat at /lanshare/. You can reference them in chat prompts and LibreChat will include their content as context. PDFs, images, and text documents placed here can be used for RAG-style queries.

Backup Hint: This directory is a host bind mount at ./data/netai-lanshare/. Include it in your regular backup routine.

9. Run the Test Suite (Optional)

# Fast smoke test (~30-40s) — containers, network, auth, data stores, backup
pytest tests/ -m "not slow" -v

# Full suite including chat completion (~2-3 min)
pytest tests/ -v

# Compliance-specific tests
pytest tests/ -m "pii_guard" -v
pytest tests/ -m "security_guard" -v

# Cascade LLM routing tests
pytest tests/test_llm_gateway.py -v

Caddy Reverse Proxy Routes

Path Upstream Notes
/ librechat:3080 LibreChat
/search/ searxng:8080 With HTML URL rewriting via replace
/agent/ agent-hermes:9119 Dashboard, HTML/JS/CSS rewriting, auth via LibreChat
/agent-api/ agent-hermes:8642 Gateway API, no auth
/hermes-webui/ hermes-webui:8787 HTML rewriting, auth via LibreChat
/beszel/ netai-beszel:8090 Beszel dashboard (WebSocket for real-time), auth via LibreChat
/tts/ supertonic-tts:8800 SuperTonic TTS API
/inference/ netai-inference:8080 llama.cpp API
/pii-guard/ pii-guard:8777 PII-Guard API
/security-guard/ security-guard:8778 Security-Guard API
/speech-stt/ speech-stt:5092 Parakeet STT API
/knowledge/ lightrag:8020 LightRAG Knowledge Graph API (query, document upload)

API Endpoints Reference

All internal services are reachable via Docker network (netai-stack-se). The Caddy reverse proxy makes selected endpoints available externally on port 443 (HTTPS). Unless noted, endpoints require no authentication when accessed from the internal Docker network; external routes via Caddy may require a valid LibreChat JWT session.

Service Internal URL External Path Protocol Auth Purpose
LibreChat http://librechat:3080 / OpenAI-compatible REST JWT Bearer Chat frontend, conversation management, preset config
Cascade LLM http://netai-cascade-llm:3000 — (internal only) OpenAI-compatible REST None Complexity + confidence routing between small and large LLM
Security-Guard http://security-guard:8778 /security-guard/ OpenAI-compatible REST None (external via Caddy) Prompt injection detection, transparent proxy to Cascade LLM
PII-Guard http://pii-guard:8777 /pii-guard/ REST None PII redaction, SearXNG proxy with GDPR compliance
Inference (raw) http://inference-server:8080 /inference/ OpenAI-compatible REST None Raw llama.cpp (Qwen3.6-35B) — no guard, no routing
Auxiliary (raw) http://auxiliary-server:8080 — (internal only) OpenAI-compatible REST None Raw llama.cpp (LFM2.5-VL-1.6B) — small model backend
Hermes Agent http://agent-hermes:8642 /agent-api/ OpenAI-compatible REST LibreChat JWT Hermes Agent gateway API (tool calling, code execution)
SearXNG http://searxng:8080 /search/ REST + Web UI None Privacy-respecting meta search engine
TTS http://supertonic-tts:8800 /tts/ OpenAI-compatible REST None Text-to-Speech (SuperTonic)
STT http://speech-stt:5092 /speech-stt/ OpenAI-compatible REST None Speech-to-Text (Parakeet TDT)
Beszel http://beszel:8090 /beszel/ REST + WebSocket LibreChat JWT System monitoring dashboard
Auth-Validator http://auth-validator:8081 — (internal only) REST JWT Bearer Caddy forward_auth backend — validates LibreChat sessions
LightRAG http://lightrag:8020 /knowledge/ REST (OpenAI-compatible) API Key (optional) Graph-enhanced RAG — entity extraction, hybrid search, document ingestion
MongoDB mongodb://mongo:27017 — (internal only) MongoDB Wire None LibreChat database (internal Docker network only)
LAN Share smb://<HOST>:445/netai-lanshare \\<HOST>\netai-lanshare SMB/CIFS SMB user netai File share, mounted in LibreChat at /lanshare/
Caddy caddy:443 https://<DOMAIN>/ HTTP/HTTPS LibreChat JWT (selected routes) TLS termination, path-based reverse proxy

Endpoint Details

LibreChat API (OpenAI-Compatible Chat + Proprietary)

Method Path Auth Description
POST /api/auth/login None Login with email + password, returns JWT
POST /api/auth/refresh JWT Refresh expired JWT token
POST /api/auth/logout JWT Invalidate current session
GET /api/auth/user JWT Get current user profile
GET /api/config JWT Get client configuration (includes modelSpecs)
GET /api/models JWT List available models
POST /api/chat/completions JWT Chat completions (OpenAI-compatible format)
POST /api/chat/stream JWT Streaming chat completions (SSE)
GET /api/convos JWT List conversations
GET /api/convos/:id JWT Get single conversation
DELETE /api/convos/:id JWT Delete conversation
POST /api/convos/clear JWT Clear all conversations
GET /api/presets JWT List model presets
POST /api/presets JWT Create preset
PUT /api/presets/:id JWT Update preset
DELETE /api/presets/:id JWT Delete preset
POST /api/endpoints JWT Register custom endpoints
POST /api/agents JWT Create agent (if agents enabled)
GET /api/agents JWT List agents
GET /api/agents/:id JWT Get agent details
POST /api/agents/:id/completions JWT Agent-based chat completion
POST /api/files/upload JWT Upload file for RAG context
GET /api/files/:id JWT Get uploaded file
POST /api/tags JWT Create tag
GET /api/tags JWT List tags

Auth: Include Authorization: Bearer <JWT_TOKEN> header. Token obtained from POST /api/auth/login.

Security-Guard (Prompt Injection Protection)

Method Path Auth Description
POST /v1/chat/completions None Transparent proxy — classifies prompt via Mezzo, forwards to Cascade LLM if safe (supports streaming SSE)
GET /v1/models None Passthrough model discovery
POST /classify None Standalone prompt classification — returns {"safe": true/false, "category": "...", "score": N}
POST /filter None Block unsafe prompts (403), forward safe ones to inference
POST /inference None Legacy guarded completion endpoint
GET /health None Health check

Response headers: X-MeZzo-Safe, X-MeZzo-Category, X-MeZzo-Score

PII-Guard (GDPR Redaction)

Method Path Auth Description
POST /redact None Standalone PII redaction — send JSON {"text": "..."}, receive redacted text
POST /search None Proxy — redacts query, forwards to SearXNG, returns JSON results
GET /search/redacted None Debug/compliance — shows what would be redacted without forwarding
GET /health None Health check

Response headers: X-PIGuard-Redacted (count), X-PIGuard-Compliance (GDPR article reference)

Cascade LLM (Internal Router — No External Route)

Method Path Auth Description
POST /v1/chat/completions None OpenAI-compatible — evaluates complexity, routes to small/large model. Supports logprobs: true for confidence-based fallback. Non-streaming only: reroutes to large model when confidence < threshold.
GET /health None Health check

Request body: Standard OpenAI chat completions format. Add "image_url" content parts for multimodal routing. Response headers (when small model kept): x-confidence (float 0.0–1.0)

Inference & Auxiliary (Raw llama.cpp — No Guard)

Method Path Auth Description
POST /v1/chat/completions None Raw llama.cpp completions (OpenAI-compatible). No prompt injection check, no confidence routing.
GET /v1/models None List loaded models
GET /health None Health check

TTS (SuperTonic — OpenAI-Compatible)

Method Path Auth Description
POST /v1/audio/speech None Text-to-Speech. Accepts {"model": "tts-1", "input": "...", "voice": "nova", "response_format": "wav"}
GET /health None Health check

Voices: alloy, echo, fable, onyx, nova, shimmer, ash, ballad, coral, marin, cedar, sage, verse

STT (Parakeet TDT — OpenAI-Compatible)

Method Path Auth Description
POST /v1/audio/transcriptions None Speech-to-Text. Accepts multipart form with audio file. Returns {"text": "..."}
GET /health None Health check

Hermes Agent API (OpenAI-Compatible)

Method Path Auth Description
POST /v1/chat/completions LibreChat JWT Chat with Hermes Agent — supports tool calling, MCP tools, code execution
GET /health None Health check

External path: https://<DOMAIN>/agent-api/v1/chat/completions Note: External access requires a valid LibreChat session (auth-validator validates the JWT before proxying).

SearXNG (Web Search)

Method Path Auth Description
GET / None Web UI (HTML)
POST /search None JSON search API — accepts {"q": "...", "format": "json", "language": "de-DE"}
GET /health None Health check

External path: https://<DOMAIN>/search/ (via PII-Guard for GDPR compliance from LibreChat)

Auth-Validator (Internal — No External Route)

Method Path Auth Description
GET / JWT Bearer Validates LibreChat JWT. Returns 200 if valid, 401 if invalid/expired. Used by Caddy's forward_auth directive.

LAN Share (SMB/CIFS)

Protocol Path Auth Description
SMB/CIFS \\<HOST>\netai-lanshare SMB user netai Read/write file share. Also mounted in LibreChat at /lanshare/ for browser access.

LightRAG (Graph-Enhanced RAG)

Method Path Auth Description
POST /documents/upload API Key Upload document — LightRAG extracts entities + relationships, builds knowledge graph
POST /query API Key Query the knowledge graph. Modes: naive (vector-only), local (entity-level), global (community-level), hybrid (all)
GET /graph API Key Retrieve the knowledge graph structure (entities + relationships)
GET /health None Health check
POST /v1/chat/completions API Key OpenAI-compatible endpoint — accepts standard chat format, returns RAG-augmented responses

Auth: Optional API key via Authorization: Bearer <key> header. Default: sk-no-key-required.

Document ingestion: Use the provided script to bulk-import documents:

# Ingest all files from the LAN share
./scripts/ingest-docs.sh

# Ingest a single document
./scripts/ingest-docs.sh /path/to/document.pdf

# Watch the LAN share for new files (requires inotify-tools)
sudo apt install inotify-tools
./scripts/ingest-docs.sh --watch

JWT Authentication Flow

  1. POST /api/auth/login with {"email": "...", "password": "..."} → returns token (JWT) and refreshToken
  2. Include Authorization: Bearer <token> in subsequent requests
  3. When the token expires, POST /api/auth/refresh with {"token": "<refreshToken>"} → returns new token pair
  4. For external routes requiring auth (marked "LibreChat JWT") Caddy's forward_auth middleware automatically validates the JWT against auth-validator:8081

Intel SYCL Notes

  • This stack uses the SYCL backend via the official ghcr.io/ggml-org/llama.cpp:server-intel-b9641 image.
  • Ensure your kernel is 6.8 or newer for native Xe/i915 support on Battlemage.
  • The environment variable ONEAPI_DEVICE_SELECTOR=*:gpu is passed to the inference container.
  • Important: llama.cpp SYCL backend only supports discrete Intel Arc GPUs (Xe-HPG+, like Arc Pro B50). Integrated Xe-LP GPUs (UHD 770) are not enumerated by the SYCL backend and cannot be used for inference. The inference-server container is strictly bound to the discrete Arc GPU.
  • The auxiliary-server container also uses SYCL and shares the same dGPU for the small multimodal model (LFM2.5-VL-1.6B).
  • Security-Guard uses the iGPU (/dev/dri/card0) separately for Mezzo-Prompt-Guard inference.
  • There are no NVIDIA/CUDA dependencies in this stack.

SSL / TLS

Caddy handles TLS termination. For production, configure Let's Encrypt certificates. For development, self-signed certificates can be used:

# Place certificates at paths specified in .env
# Caddy will use them for TLS

Testing

A pytest-based integration test suite validates containers, network paths, APIs, authentication, chat completions, data stores, and backups.

Prerequisites: pytest and requests must be installed (pip install pytest requests).

# Fast smoke test (containers, network, auth, data stores, backup — ~30-40s)
pytest tests/ -m "not slow" -v

# Full suite including chat completion (~2-3 min)
pytest tests/ -v

# Parallel execution (requires pytest-xdist)
pytest tests/ -m "not slow" -v -n auto

# Compliance-specific
pytest tests/ -m "pii_guard" -v
pytest tests/ -m "security_guard" -v

# Cascade LLM
pytest tests/test_llm_gateway.py -v
Test File Coverage
test_containers.py Docker container status and health checks
test_network_paths.py Caddy routing and inter-service DNS
test_api_endpoints.py LLM inference, LibreChat, Hermes API, SearXNG
test_authentication.py LibreChat login and token validation
test_chat_completion.py End-to-end chat via LibreChat API
test_data_stores.py SQLite integrity, ChromaDB, uploads
test_backup.py Backup script execution and archive validation
test_pii_guard.py GDPR compliance, PII redaction, search proxy
test_security_guard.py Prompt injection detection, filtering, Article 52
test_beszel.py Beszel monitoring metrics
test_llm_gateway.py Cascade LLM routing, complexity scoring, confidence fallback, streaming
test_hermes_webui.py Hermes compose config, Caddy proxy, .env.example
test_hermes_playwright.py Hermes API server and LibreChat browser tests

Backup & Restore

LibreChat stores all data in MongoDB (persistent volume mongo-data).

Create a Backup

docker compose exec mongo mongodump --archive=/backups/librechat-$(date +%Y%m%d).archive

Restore from Backup

docker compose exec -T mongo mongorestore --archive=< backup-file.archive

Troubleshooting

GPU Not Detected

lspci | grep -i vga | grep -i intel

If empty, verify the GPU is seated and the kernel module is loaded:

sudo dmesg | grep i915
sudo intel_gpu_top

Permission Denied on /dev/dri

Ensure your user is in the video and render groups, then log out and back in:

sudo usermod -aG video,render $USER

Model File Missing

setup.sh will warn you if the expected GGUF files are absent. Download them via Hugging Face CLI or wget and place them in their respective directories under models/.

Security-Guard Not Starting

Verify the Mezzo-Prompt-Guard model file is present at the path specified by SECURITY_GUARD_MODEL_PATH in .env or the default path in docker-compose.yml.

License

This deployment configuration is provided as-is for B2B on-premise deployments. Model weights and upstream container images are subject to their respective licenses.

About

ai Infrastructure for Intel Arc GPUs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors