Skip to content

Commit 9106fd1

Browse files
committed
feat: @model tag field — persist selected model per AI DocGen card
- Add @model: metadata field to all tag types (AI, Agent, Image, OCR, TTS, STT, Translate) - Intelligent defaults: OCR→granite-docling, TTS→kokoro-tts, STT→voxtral-stt, Image→imagen-ultra - Dropdown shows all registered models; changing it syncs @model: back to editor text - Validated against AI_MODELS registry — invalid model IDs silently ignored - Per-block buildModelOptionsHtml() replaces static model option HTML for correct pre-selection - @model: stripped from display text in all card renderers - Follows existing @lang:/@engine: sync pattern - 13 new Playwright tests (insertion defaults, parsing, pre-selection, display, sync, backward compat) - README updated with @model: in feature description and release notes - Fully backward-compatible with existing tags (no @model: = uses default)
1 parent e46a70d commit 9106fd1

5 files changed

Lines changed: 370 additions & 30 deletions

File tree

CHANGELOG-model-ux.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# CHANGELOG — Model Loading UX
2+
3+
**Date:** 2026-03-12
4+
5+
## Summary
6+
7+
Enhanced model loading UX with cache/download status detection, source location display, and model deletion capability.
8+
9+
## Changes
10+
11+
### Workers (7 files)
12+
- Updated `progressCb` / `progress_callback` in all worker files to forward Transformers.js `status` field (`initiate`, `progress`, `done`)
13+
- Added `source` (MODEL_ID) and `loadingPhase` fields to worker messages
14+
- Files: `public/ai-worker.js`, `public/ai-worker-docling.js`, `public/ai-worker-florence.js`, `public/ai-worker-lfm.js`, `js/tts-worker.js`, `js/speech-worker.js`, `js/voxtral-worker.js`
15+
16+
### UI — ai-assistant.js
17+
- **Cache detection**: Tracks `initiate→done` pairs without `progress` bytes to detect cache loads
18+
- **Source display**: Shows `huggingface.co/{model-id}` during loading via `#ai-progress-source`
19+
- **Model deletion**: New `deleteModelCache()` function clears Cache API + OPFS + consent flags
20+
- **Delete button handler**: Wired to consent dialog with toast feedback
21+
- Exposed `deleteModelCache` on `M._ai` for cross-module access
22+
23+
### Modal — modal-templates.js
24+
- Added `#ai-progress-source` element in download progress section
25+
- Added `#ai-consent-delete` danger button in consent dialog footer
26+
27+
### CSS — ai-panel.css + styles.css
28+
- `.ai-progress-source` — subtle monospace info bar with icon
29+
- `.ai-consent-btn-danger` — red delete button with dark mode variants

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
| **Writing Modes** | Zen mode (distraction-free fullscreen), Focus mode (dimmed paragraphs), Dark mode, multiple preview themes (GitHub, GitLab, Notion, Dracula, Solarized, Evergreen) |
2727
| **Rendering** | GitHub-style Markdown, syntax highlighting (180+ languages), LaTeX math (MathJax), Mermaid diagrams (zoom/pan/export), PlantUML diagrams, callout blocks, footnotes, emoji, anchor links |
2828
| **🎬 Media Embedding** | Video playback via `![alt](video.mp4)` image syntax (`.mp4`, `.webm`, `.ogg`, `.mov`, `.m4v`); YouTube/Vimeo embeds auto-detected; `embed` code block for responsive media grids (`cols=1-4`, `height=N`); Video.js v10 lazy-loaded with native `<video>` fallback; website URLs render as rich link preview cards with favicon + "Open ↗" button |
29-
| **🤖 AI Assistant** | 3 local Qwen 3.5 sizes (0.8B / 2B / 4B via WebGPU/WASM), Gemini 3.1 Flash Lite, Groq Llama 3.3 70B, OpenRouter — summarize, expand, rephrase, grammar-fix, explain, simplify, auto-complete; AI writing tags (Polish, Formalize, Elaborate, Shorten, Image); enhanced context menu; per-card model selection; concurrent block generation; inline review with accept/reject/regenerate; AI-powered image generation |
29+
| **🤖 AI Assistant** | 3 local Qwen 3.5 sizes (0.8B / 2B / 4B via WebGPU/WASM), Gemini 3.1 Flash Lite, Groq Llama 3.3 70B, OpenRouter — summarize, expand, rephrase, grammar-fix, explain, simplify, auto-complete; AI writing tags (Polish, Formalize, Elaborate, Shorten, Image); enhanced context menu; per-card model selection; concurrent block generation; inline review with accept/reject/regenerate; AI-powered image generation; **smart model loading UX** — cache vs download detection (📦/⬇️), HuggingFace source location display, delete cached models from browser storage |
3030
| **🎤 Voice Dictation** | Dual-engine speech-to-text: **Voxtral Mini 3B** (WebGPU, primary, 13 languages, ~2.7 GB) or **Whisper Large V3 Turbo** (WASM fallback, ~800 MB) with consensus scoring; download consent popup with model info before first use; 50+ Markdown-aware voice commands — natural phrases ("heading one", "bold…end bold", "add table", "undo"); auto-punctuation via AI refinement or built-in fallback; streaming partial results |
3131
| **🔊 Text-to-Speech** | Hybrid Kokoro TTS engine — English/Chinese via [Kokoro 82M v1.1-zh ONNX](https://huggingface.co/onnx-community/Kokoro-82M-v1.1-zh-ONNX) (~80 MB, off-thread WebWorker), Japanese & 10+ languages via Web Speech API fallback; hover any preview text and click 🔊 to hear pronunciation; voice auto-selection by language; ⬇ Save button to download generated audio as WAV file |
3232
| **Import** | MD, DOCX, XLSX/XLS, CSV, HTML, JSON, XML, PDF — drag & drop or click to import |
@@ -36,7 +36,7 @@
3636
| **Desktop** | Native app via Neutralino.js with system tray and offline support |
3737
| **Code Execution** | 6 languages in-browser: Bash ([just-bash](https://justbash.dev/)), Math (Nerdamer), Python ([Pyodide](https://pyodide.org/)), HTML (sandboxed iframe, `html-autorun` for widgets/quizzes), JavaScript (sandboxed iframe), SQL ([sql.js](https://sql.js.org/) SQLite) · 25+ compiled languages via [Judge0 CE](https://ce.judge0.com): C, C++, Rust, Go, Java, TypeScript, Kotlin, Scala, Ruby, Swift, Haskell, Dart, C#, and more · **▶ Run All** notebook engine — one-click sequential execution of all blocks with progress bar, abort, per-block status badges, and SQLite shared context store |
3838
| **Security** | Content Security Policy (CSP), SRI integrity hashes, XSS sanitization (DOMPurify), ReDoS protection, Firestore write-token ownership, API keys via HTTP headers, postMessage origin validation, 8-char passphrase minimum, sandboxed code execution |
39-
| **AI Document Tags** | `{{@AI:}}` text generation, `{{@Think:}}` deep reasoning, `{{@Image:}}` image generation (Gemini Imagen), `{{@OCR:}}` image-to-text extraction (Text/Math/Table modes, PDF page rendering via pdf.js), `{{@STT:}}` speech-to-text dictation (engine selector: Whisper/Voxtral/Web Speech API, 11 languages, Record/Stop/Insert/Clear) — `@` prefix syntax on all tag types + metadata fields (`@name`, `@use`, `@think`, `@search`, `@prompt`, `@step`, `@upload`); editable `@prompt:` textarea and `@step:` inputs in preview cards; description/prompt separation (bare text = label, `@prompt:` = AI instruction); 📎 image/PDF upload for multimodal vision analysis; per-card model selector, concurrent block operations |
39+
| **AI Document Tags** | `{{@AI:}}` text generation, `{{@Think:}}` deep reasoning, `{{@Image:}}` image generation (Gemini Imagen), `{{@OCR:}}` image-to-text extraction (Text/Math/Table modes, PDF page rendering via pdf.js), `{{@STT:}}` speech-to-text dictation (engine selector: Whisper/Voxtral/Web Speech API, 11 languages, Record/Stop/Insert/Clear) — `@` prefix syntax on all tag types + metadata fields (`@name`, `@use`, `@think`, `@search`, `@prompt`, `@step`, `@upload`, `@model`); `@model:` field persists selected model per card with intelligent defaults (OCR→`granite-docling`, TTS→`kokoro-tts`, STT→`voxtral-stt`, Image→`imagen-ultra`); editable `@prompt:` textarea and `@step:` inputs in preview cards; description/prompt separation (bare text = label, `@prompt:` = AI instruction); 📎 image/PDF upload for multimodal vision analysis; per-card model selector with document-portable model persistence, concurrent block operations |
4040
| **🔌 API Calls** | `{{API:}}` REST API integration — GET/POST/PUT/DELETE methods, custom headers, JSON body, response stored in `$(api_varName)` variables; inline review panel; toolbar GET/POST buttons |
4141
| **🔗 Agent Flow** | `{{Agent:}}` multi-step pipeline — define Step 1/2/3, chain outputs, per-card model + search provider selector, live step status indicators (⏳/✅/❌), review combined output |
4242
| **🔍 Web Search** | Toggle web search for AI — DuckDuckGo (free), Brave Search, Serper.dev; search results injected into LLM context; source citations in responses; per-agent-card search provider selector |
@@ -458,6 +458,8 @@ TextAgent has undergone significant evolution since its inception. What started
458458

459459
| Date | Commits | Feature / Update |
460460
|------|---------|-----------------|
461+
| **2026-03-12** || 🏷️ **@model Tag Field** — new `@model:` metadata field on all AI DocGen tag types (`{{@AI:}}`, `{{@Agent:}}`, `{{@Image:}}`, `{{@OCR:}}`, `{{@TTS:}}`, `{{@STT:}}`, `{{@Translate:}}`); persists selected model in document text for portability; intelligent defaults per tag type (OCR→`granite-docling`, TTS→`kokoro-tts`, STT→`voxtral-stt`, Image→`imagen-ultra`, AI/Agent→current model); dropdown shows all registered models, changing it syncs `@model:` back to editor; validated against `AI_MODELS` registry (invalid IDs silently ignored); fully backward-compatible with existing tags |
462+
| **2026-03-12** || 🔧 **Model Loading UX** — smart cache vs download detection in all 7 AI workers (📦 Loading from cache / ⬇️ Downloading from huggingface.co/textagent/...); source location display showing HuggingFace model path during loading; 🗑️ Delete Model button in consent dialog to clear Cache API + OPFS cached files and reset consent; `deleteModelCache()` function exposed on `M._ai`; workers forward Transformers.js `status` field (`initiate`/`progress`/`done`) with `source` and `loadingPhase`; new `.ai-progress-source` info bar and `.ai-consent-btn-danger` styling with dark mode |
461463
| **2026-03-12** || 🎤 **STT Tag Block & Florence-2 & TTS Download** — new `{{@STT:...}}` tag block for in-preview speech-to-text dictation with engine selector (Whisper V3 Turbo / Voxtral Mini 3B / Web Speech API), 11-language picker, Record/Stop/Insert/Clear buttons, amber-accented CSS with recording pulse animation; Florence-2 (230M) vision OCR+captioning model added (`textagent/Florence-2-base-ft`); TTS ⬇ Save button with float32→WAV encoder for audio download; PDF-to-image OCR renderer via pdf.js (2x scale, max 3 pages); Granite Docling migrated to `textagent/` with `onnx-community/` fallback, fp16 embed_tokens, degeneration loop guard, raw base64→data URL fix; Qwen3 AutoTokenizer fix for text-only models; OCR mode forwarding to doc-model workers |
462464
| **2026-03-12** | — | 🧪 **Comprehensive Test Suite** — 12 new Playwright spec files (108 tests) across 5 categories targeting past 3 days of code changes: **Functional** — unit tests for video player (URL detection, HTML builders, embed grid), TTS engine (API surface, state), speech commands (DOM elements, language selector), file converters (MD/CSV/JSON/XML/HTML import), stock widget (rendering, sandbox, double-render prevention); integration tests for embed grid pipeline and AI_MODELS registry. **Regression** — 12 tests pinning recent bug fixes (file upload crash, template confirmation, stock variable, embed rendering, mermaid stability, dark mode, XSS). **Performance** — module init timing (TTS/STT/video/stock/converter < 5–8s), complex render < 5s, embed grid < 3s. **Static Analysis** — ESLint, file size < 100KB, debugger/eval detection, CSS !important audit, IIFE patterns, worker files, HTTPS enforcement. **Security** — embed grid XSS (javascript:/data: URI), video player HTML escaping, YouTube privacy mode, TradingView sandbox, Vimeo DNT, link security, CSP validation. Total test count: 299 |
463465
| **2026-03-12** || 🎤 **Voxtral STT**[Voxtral Mini 3B](https://huggingface.co/textagent/Voxtral-Mini-3B-2507-ONNX) as primary speech-to-text engine on WebGPU (~2.7 GB, q4, 13 languages, streaming partial output via `TextStreamer`); Whisper Large V3 Turbo as WASM fallback (~800 MB, q8); `voxtral-worker.js` new WebWorker with `VoxtralForConditionalGeneration` + `VoxtralProcessor`; `speechToText.js` WebGPU detection + dual-worker routing; download consent popup (`showSttConsentPopup`) with model name/size/privacy info before first download; `STT_CONSENTED` localStorage key; model duplicated to `textagent/` HuggingFace org with `onnx-community/` fallback |

changelogs/CHANGELOG-model-tag.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# CHANGELOG — @model Tag Field
2+
3+
## Summary
4+
Added `@model:` metadata field to all AI DocGen tag types, persisting the selected model in the document text. Dropdown shows all registered models; changing it syncs back to `@model:` in the editor.
5+
6+
## Changes
7+
8+
### @model: Tag Field
9+
- New `@model:` field parsed from tag text in `parseDocgenBlocks()` — validated against `AI_MODELS` registry
10+
- Model dropdown pre-selects the `@model:` value on render; shows ALL models for every card type
11+
- Changing the dropdown writes `@model: <id>` back to the editor text (follows existing `@lang:` sync pattern)
12+
- Intelligent defaults per tag type: OCR→`granite-docling`, TTS→`kokoro-tts`, STT→`voxtral-stt`, Image→`imagen-ultra`, AI/Agent/Translate→current model or fallback
13+
14+
### Tag Insertion Defaults
15+
- `insertDocgenTag()` now includes `@model:` with appropriate default for each tag type
16+
- AI and Agent tags use the currently selected model as default; Image defaults to `imagen-ultra`
17+
18+
### Model Options Refactor
19+
- Replaced static `textModelOptionsHtml`/`imageModelOptionsHtml` with per-block `buildModelOptionsHtml()` function
20+
- Each card's dropdown is built with the correct pre-selected model based on `@model:` value
21+
- `isSttModel` models now excluded from text model dropdowns alongside `isTtsModel`
22+
23+
### Display Text Stripping
24+
- `@model:` stripped from visible card text in all renderers: OCR, TTS, STT, Translate, AI/generic
25+
26+
## Files Modified
27+
- `js/ai-docgen.js` — Parser, tag insertion, card rendering, dropdown sync, display stripping

0 commit comments

Comments
 (0)