LucidLLM is a browser-based local Large Language Model (LLM) chat application that runs AI models entirely within your browser using Transformers.js and WebGPU technology. With zero-build architecture and complete privacy, it delivers powerful AI capabilities without sending any data to external servers.
Key Highlights: 17,000+ lines of source code • 511 i18n keys across 4 languages • Recommended Models • WebGPU/WASM dual inference • 60 FPS Token Streaming • OPFS File Explorer • AES-256 encrypted Google Drive backup • Zero npm dependencies
| Feature | Description |
|---|---|
| Fully Local Inference | All AI inference runs in-browser using Transformers.js; no data leaves your device |
| Inference Device Toggle | Switch between WebGPU and WASM at runtime for compatibility and performance |
| OPFS Model Caching | Origin Private File System stores models persistently without re-downloading |
| OPFS Fetch Interceptor | Worker-level fetch interception with Range Request support for efficient model loading |
| External Data Support | Automatic detection and loading of ONNX external data files (.onnx_data, .onnx.data) with multi-shard support |
| HF Token Support | Access private/gated models with your Hugging Face token |
| Model Download Manager | Supports pause/resume, retry with exponential backoff, quantization selection, and auto-quota reclamation |
| Recommended Models | Pre-configured verified models with one-click download from the Model tab |
| Model Audit & Update | Verify model integrity and check for latest versions on Hugging Face |
| Bootstrap Config | Automatically applies generation_config.json (temperature, top_p, max_length, repetition_penalty) upon first model load |
| Pipeline Cache | Memory cache for up to 4 active pipelines for instant switching with graceful GPU cleanup |
| Feature | Description |
|---|---|
| Multi-Session Chat | Up to 10 independent chat sessions with separate histories |
| 60 FPS Token Streaming | Smooth, frame-limited streaming display with animated blinking cursor (▍) |
| Smart Token Counter | Intelligent token estimation accounting for CJK and ASCII characters |
| LLM Parameters | Real-time control over Top-P, Temperature, and Presence Penalty sliders |
| Token Speed Statistics | Shows Avg/Max/Min tokens per second during inference |
| Memory Usage Display | Real-time memory consumption monitoring |
| Abort Generation | Graceful abort with GPU buffer cleanup — stop AI response at any time |
| Message Edit & Regenerate | Edit any user message and regenerate the AI response |
| Conversation Branching | Fork conversations from any point to explore alternative responses |
| Conversation Export | Export chat history as JSON for archival |
| System Prompt Editor | Customizable assistant behavior (max 20 lines) |
| Context Window Control | Selectable context sizes: 4k, 8k, 16k, 32k, 128k |
Manage your local model storage with a full-featured file explorer:
- Directory Tree: Visual navigation of the Origin Private File System
- File Operations: Create, rename, move, and delete files/folders
- Drag & Drop: Direct upload support for model files
- Context Menu: Right-click actions for quick management
- Storage Stats: Real-time display of used/available space
Personalize your experience with a custom identity:
- Custom Nicknames: Set your own local user identity
- Avatar Support: Upload and store your own profile images
- Backup Key Derivation: Your nickname becomes part of the encryption key for Google Drive backups
Stay up to date with the latest features:
- GitHub Release Sync: Automatically checks for updates via GitHub API every 6 hours
- Update Notifications: Visual badges and changelog modals for new releases
- Full PWA Support: Installable app with Service Worker caching (
cache-firstfor app assets,network-then-cachefor WASM binaries) - Skip Waiting: Seamless update flow with
SKIP_WAITINGmessage protocol
| Feature | Description |
|---|---|
| Google Drive Backup | Encrypted backup of settings and chat history to Google Drive |
| AES-GCM-256 Encryption | Client-side encryption with PBKDF2 key derivation (250,000 iterations) |
| Gzip Compression | Uses CompressionStream API for efficient backup payloads |
| Auto Backup | Automatic backup on changes with debouncing (25s) |
| Backup Restore & Undo | Restore from snapshots with a 5-second undo window for resets |
| No Server Communication | All data stays local unless explicitly backed up |
| Feature | Description |
|---|---|
| Toast Notification | Global notification system for success, info, warning, and error alerts |
| State Lamp | Color-coded status indicator for model loading and session states |
| 6-Tab Settings | Categorized: Model, LLM, Profile, Appearance, Language, Backup |
| 4 Theme Options | Dark (default), Light, OLED Black, and High-Contrast |
| Font Size Control | Adjustable font scale with CSS custom properties |
| Accessibility | Built-in Focus Trap for modal management and full keyboard support |
| Keyboard Shortcuts | Ctrl+N (new), Ctrl+Enter (send), Ctrl+L (focus), Ctrl+, (settings), Ctrl+Shift+Backspace (abort) |
These models have been tested and verified to work correctly in LucidLLM:
| Model Name | Quantization | Status | Sanity QA |
|---|---|---|---|
| HuggingFaceTB/SmolLM2-135M-Instruct | FP32, BNB4, Q4 | Verified | Pass |
| vicgalle/gpt2-alpaca-gpt4 | Unknown | Verified | Pass |
| onnx-community/Qwen2.5-0.5B-Instruct | Q4, INT8, UINT8, BNB4 | Verified | Pass |
| willopcbeta/GPT-5-Distill-Qwen3-4B-Instruct-Heretic-ONNX | Q4 | Verified | Pass |
| onnx-community/Phi-4-mini-instruct-ONNX | Q4 | Verified | Pass |
| onnx-community/Apertus-8B-Instruct-2509-ONNX | Q4 | Verified | Pass |
| onnx-community/Qwen3-4B-Thinking-2507-ONNX | Q4 | Verified | Pass |
Sanity QA: Each model must correctly answer "What is gravity?" and "What is the capital of France?" — see compatibility.md for details.
index.html → bootstrap.js (entry)
├─ constants.js Shared constants & enums (no imports, ~121 lines)
├─ i18n-keys.js i18n key definitions (~589 lines)
├─ i18n.js Translations (4 languages, ~217 lines) + locales/
├─ shared-utils.js Pure utilities & constants (50+ exports, ~583 lines)
├─ shared-utils-i18n.js i18n-aware shared utilities (~41 lines)
└─ main.js Core: UI, state, OPFS, inference orchestration (~12,829 lines)
├─ opfs-utils.js OPFS file system utilities (~397 lines)
├─ drive-backup.js AES-GCM encrypt/decrypt, gzip, Drive payload format (~267 lines)
└─ worker.js Web Worker — Transformers.js pipeline, OPFS fetch interception (~626 lines)
- No build step. All source files are native ES Modules served directly
- Single
stateobject (~170 fields) manages the entire application state elsobject caches 200+ DOM elements for zero-query renderingwindow.LucidAppexposes a public debug/API surface for console access- Web Worker handles model loading and inference in a separate thread; communicates via
postMessageprotocol with typed message enums (WORKER_MSG) window.fetchmonkey-patch in Worker intercepts Hugging Face URLs to serve from OPFS cache firstconstants.jsis a zero-import module isolating all shared constants to prevent circular dependencies
| Requirement | Details |
|---|---|
| Recommended | Chrome 113+ / Edge 113+ (WebGPU support) |
| Fallback | Any modern browser with WASM support |
| Security | HTTPS or localhost required for OPFS |
| JavaScript | ES2020+ with Module support |
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 4 GB | 8 GB+ |
| Storage | Varies by model (100 MB – 5 GB per model) | SSD recommended |
| GPU | Integrated graphics | Dedicated GPU with WebGPU support |
Try the GitHub Pages demo (no install required):
👉 https://ergo9ine.github.io/LucidLLM/
Tip: Deploy to Cloudflare Pages for multi-threaded WASM inference via COOP/COEP headers.
git clone https://github.com/ergo9ine/LucidLLM.git
cd LucidLLM
npx serve -s . -l 3000 # serves at http://localhost:3000Open the app in Chrome/Edge and go to Settings → Model Management to fetch and activate a model.
- Open Settings (Ctrl+,) → Model Management.
- Enter a Hugging Face model ID (e.g.,
HuggingFaceTB/SmolLM2-135M-Instruct) or select one from the Recommended Models list. - Click Fetch to retrieve model metadata, then select a quantization and click Download.
- Downloads support pause/resume and auto-retry with exponential backoff.
- Once a model is downloaded, click Activate in the session table.
- Wait for the state lamp to turn green (Loaded).
- Type your message and press Ctrl+Enter or click Send.
- Use the + button in the tab bar to create new chat sessions (up to 10).
Adjust generation parameters in Settings → LLM:
| Setting | Default | Description |
|---|---|---|
| System Prompt | "You are a helpful assistant." | Defines the AI's role and personality |
| Max Tokens | 512 | Maximum length of a single response |
| Context Window | 8k | How much conversation the model remembers |
| Temperature | 0.9 | Controls response creativity and randomness |
| Top-P | 0.95 | Nucleus sampling threshold |
- Go to Settings → Backup & Restore.
- Click Connect to Google Drive and sign in.
- Enable Auto Backup for automatic saves (25s debounce).
- Data is encrypted client-side using your nickname as part of the key derivation.
- Restore from any snapshot with a 5-second undo window.
Works out of the box. Note: GitHub Pages does not support custom response headers, so SharedArrayBuffer is unavailable and WASM inference is limited to single-threaded mode.
For full multi-threaded WASM inference, deploy to Cloudflare Pages. The _headers file in the repo root configures the required headers:
/*
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: credentialless
These headers enable SharedArrayBuffer, which is required for multi-threaded ONNX Runtime WASM execution.
Any static file server with HTTPS support will work:
npx serve -s . -l 3000 # Node.js
python -m http.server 8000 # PythonFor multi-threaded WASM, configure your server to send COOP/COEP headers on all responses.
git clone https://github.com/ergo9ine/LucidLLM.git
cd LucidLLM
npm install # Only needed for tests/dev toolsnpx serve -s . -l 3000 # Zero-build — edit files and reloadcd test && npm test # Vitest unit tests
npx playwright test # E2E tests (Chromium, downloads real models)- E2E tests force WASM device (
lucid_inference_device = 'wasm') since headless Chromium lacks WebGPU. - First E2E run may take 10+ minutes to download test models.
Access the app state and API via window.LucidApp in the browser console for debugging and inspection.
| File | Lines | Purpose |
|---|---|---|
script/bootstrap.js |
~116 | Startup, early i18n, Service Worker registration |
script/constants.js |
~121 | Shared constants & enums (zero-import module) |
script/i18n-keys.js |
~589 | i18n key definitions (single source of truth) |
script/i18n.js |
~217 | Translation loader (ko/en/ja/zh-CN) + locales/ |
script/shared-utils.js |
~583 | 50+ pure utility functions and constants |
script/shared-utils-i18n.js |
~41 | i18n-aware shared utility wrappers |
script/main.js |
~12,829 | Core logic, state machine, UI rendering, OPFS management |
script/opfs-utils.js |
~397 | OPFS file system helpers and utilities |
script/worker.js |
~626 | Web Worker for Transformers.js inference pipeline |
script/drive-backup.js |
~267 | AES-GCM encryption, gzip compression, Drive API helpers |
| Category | Technology |
|---|---|
| Language | JavaScript (ES2020+ Modules) |
| Architecture | Zero-build, Vanilla JS, No Framework, No npm dependencies |
| ML Framework | Transformers.js v4.0.0-next.6 |
| Model Format | ONNX (with external data support) |
| Inference Backend | WebGPU / WASM (automatic fallback) |
| Storage | Origin Private File System (OPFS), localStorage |
| Encryption | Web Crypto API (PBKDF2 + AES-GCM-256) |
| Compression | CompressionStream API (Gzip) |
| Styling | Tailwind CSS v3 (CDN) + Custom CSS Variables |
| Icons | Lucide Icons (self-hosted) |
| Fonts | Space Grotesk (Google Fonts) |
| Auth | Google Identity Services (OAuth 2.0) |
| Testing | Vitest (unit), Playwright (E2E) |
- All inference and chat data remain local by default — nothing is sent to any server.
- Google Drive backups are optional and encrypted client-side with AES-GCM-256 before upload.
- Model weights are stored in the secure Origin Private File System (OPFS), sandboxed from other origins.
- No analytics, no telemetry, no tracking.
LucidLLM/
├── index.html # Main HTML entry point (1,200+ lines)
├── sw.js # Service Worker (PWA cache)
├── _headers # Cloudflare Pages COOP/COEP headers
├── script/
│ ├── bootstrap.js # App initialization & early i18n
│ ├── constants.js # Shared constants & enums (zero-import)
│ ├── main.js # Core logic, state, UI rendering
│ ├── i18n-keys.js # i18n key definitions
│ ├── i18n.js # i18n module (ko/en/ja/zh-CN)
│ ├── shared-utils.js # Shared utilities & global API
│ ├── shared-utils-i18n.js # i18n-aware utility wrappers
│ ├── opfs-utils.js # OPFS file system utilities
│ ├── worker.js # Web Worker for inference
│ ├── drive-backup.js # Encrypted Google Drive backup
│ ├── lucide.min.js # Self-hosted Lucide icons
│ └── locales/ # Language locale files
├── vendor/
│ └── transformers/ # Self-hosted Transformers.js + ONNX Runtime WASM
├── test/ # Vitest unit tests
├── docs/ # Documentation & localized READMEs
│ ├── README.ko.md
│ ├── README.ja.md
│ ├── README.zh-CN.md
│ ├── compatibility.md # Verified model list & QA criteria
│ └── roadmap.md # Feature roadmap
├── favicon.svg # App icon
└── package.json # NPM config (zero app dependencies)
- For major changes, please open an Issue first to discuss.
- PR flow: fork → branch → PR (with description, screenshots, and tests).
- See roadmap.md for planned features accepting contributions.
This project is licensed under the MIT License — see the LICENSE file for details.
Copyright (c) 2025 Oraios AI
- Hugging Face — Transformers.js and model hosting
- Transformers.js — In-browser ML inference
- ONNX Runtime Web — WebGPU/WASM model execution
- Tailwind CSS — Utility-first CSS framework
- Lucide Icons — Beautiful open-source icons
- Space Grotesk — Font family
Made with ❤️ for privacy-focused AI