|
26 | 26 | | **Writing Modes** | Zen mode (distraction-free fullscreen), Focus mode (dimmed paragraphs), Dark mode, multiple preview themes (GitHub, GitLab, Notion, Dracula, Solarized, Evergreen) | |
27 | 27 | | **Rendering** | GitHub-style Markdown, syntax highlighting (180+ languages), LaTeX math (MathJax), Mermaid diagrams (zoom/pan/export), PlantUML diagrams, callout blocks, footnotes, emoji, anchor links | |
28 | 28 | | **🎬 Media Embedding** | Video playback via `` image syntax (`.mp4`, `.webm`, `.ogg`, `.mov`, `.m4v`); YouTube/Vimeo embeds auto-detected; `embed` code block for responsive media grids (`cols=1-4`, `height=N`); Video.js v10 lazy-loaded with native `<video>` fallback; website URLs render as rich link preview cards with favicon + "Open ↗" button | |
29 | | -| **🤖 AI Assistant** | 3 local Qwen 3.5 sizes (0.8B / 2B / 4B via WebGPU/WASM), Gemini 3.1 Flash Lite, Groq Llama 3.3 70B, OpenRouter — summarize, expand, rephrase, grammar-fix, explain, simplify, auto-complete; AI writing tags (Polish, Formalize, Elaborate, Shorten, Image); enhanced context menu; per-card model selection; concurrent block generation; inline review with accept/reject/regenerate; AI-powered image generation | |
| 29 | +| **🤖 AI Assistant** | 3 local Qwen 3.5 sizes (0.8B / 2B / 4B via WebGPU/WASM), Gemini 3.1 Flash Lite, Groq Llama 3.3 70B, OpenRouter — summarize, expand, rephrase, grammar-fix, explain, simplify, auto-complete; AI writing tags (Polish, Formalize, Elaborate, Shorten, Image); enhanced context menu; per-card model selection; concurrent block generation; inline review with accept/reject/regenerate; AI-powered image generation; **smart model loading UX** — cache vs download detection (📦/⬇️), HuggingFace source location display, delete cached models from browser storage | |
30 | 30 | | **🎤 Voice Dictation** | Dual-engine speech-to-text: **Voxtral Mini 3B** (WebGPU, primary, 13 languages, ~2.7 GB) or **Whisper Large V3 Turbo** (WASM fallback, ~800 MB) with consensus scoring; download consent popup with model info before first use; 50+ Markdown-aware voice commands — natural phrases ("heading one", "bold…end bold", "add table", "undo"); auto-punctuation via AI refinement or built-in fallback; streaming partial results | |
31 | 31 | | **🔊 Text-to-Speech** | Hybrid Kokoro TTS engine — English/Chinese via [Kokoro 82M v1.1-zh ONNX](https://huggingface.co/onnx-community/Kokoro-82M-v1.1-zh-ONNX) (~80 MB, off-thread WebWorker), Japanese & 10+ languages via Web Speech API fallback; hover any preview text and click 🔊 to hear pronunciation; voice auto-selection by language; ⬇ Save button to download generated audio as WAV file | |
32 | 32 | | **Import** | MD, DOCX, XLSX/XLS, CSV, HTML, JSON, XML, PDF — drag & drop or click to import | |
|
36 | 36 | | **Desktop** | Native app via Neutralino.js with system tray and offline support | |
37 | 37 | | **Code Execution** | 6 languages in-browser: Bash ([just-bash](https://justbash.dev/)), Math (Nerdamer), Python ([Pyodide](https://pyodide.org/)), HTML (sandboxed iframe, `html-autorun` for widgets/quizzes), JavaScript (sandboxed iframe), SQL ([sql.js](https://sql.js.org/) SQLite) · 25+ compiled languages via [Judge0 CE](https://ce.judge0.com): C, C++, Rust, Go, Java, TypeScript, Kotlin, Scala, Ruby, Swift, Haskell, Dart, C#, and more · **▶ Run All** notebook engine — one-click sequential execution of all blocks with progress bar, abort, per-block status badges, and SQLite shared context store | |
38 | 38 | | **Security** | Content Security Policy (CSP), SRI integrity hashes, XSS sanitization (DOMPurify), ReDoS protection, Firestore write-token ownership, API keys via HTTP headers, postMessage origin validation, 8-char passphrase minimum, sandboxed code execution | |
39 | | -| **AI Document Tags** | `{{@AI:}}` text generation, `{{@Think:}}` deep reasoning, `{{@Image:}}` image generation (Gemini Imagen), `{{@OCR:}}` image-to-text extraction (Text/Math/Table modes, PDF page rendering via pdf.js), `{{@STT:}}` speech-to-text dictation (engine selector: Whisper/Voxtral/Web Speech API, 11 languages, Record/Stop/Insert/Clear) — `@` prefix syntax on all tag types + metadata fields (`@name`, `@use`, `@think`, `@search`, `@prompt`, `@step`, `@upload`); editable `@prompt:` textarea and `@step:` inputs in preview cards; description/prompt separation (bare text = label, `@prompt:` = AI instruction); 📎 image/PDF upload for multimodal vision analysis; per-card model selector, concurrent block operations | |
| 39 | +| **AI Document Tags** | `{{@AI:}}` text generation, `{{@Think:}}` deep reasoning, `{{@Image:}}` image generation (Gemini Imagen), `{{@OCR:}}` image-to-text extraction (Text/Math/Table modes, PDF page rendering via pdf.js), `{{@STT:}}` speech-to-text dictation (engine selector: Whisper/Voxtral/Web Speech API, 11 languages, Record/Stop/Insert/Clear) — `@` prefix syntax on all tag types + metadata fields (`@name`, `@use`, `@think`, `@search`, `@prompt`, `@step`, `@upload`, `@model`); `@model:` field persists selected model per card with intelligent defaults (OCR→`granite-docling`, TTS→`kokoro-tts`, STT→`voxtral-stt`, Image→`imagen-ultra`); editable `@prompt:` textarea and `@step:` inputs in preview cards; description/prompt separation (bare text = label, `@prompt:` = AI instruction); 📎 image/PDF upload for multimodal vision analysis; per-card model selector with document-portable model persistence, concurrent block operations | |
40 | 40 | | **🔌 API Calls** | `{{API:}}` REST API integration — GET/POST/PUT/DELETE methods, custom headers, JSON body, response stored in `$(api_varName)` variables; inline review panel; toolbar GET/POST buttons | |
41 | 41 | | **🔗 Agent Flow** | `{{Agent:}}` multi-step pipeline — define Step 1/2/3, chain outputs, per-card model + search provider selector, live step status indicators (⏳/✅/❌), review combined output | |
42 | 42 | | **🔍 Web Search** | Toggle web search for AI — DuckDuckGo (free), Brave Search, Serper.dev; search results injected into LLM context; source citations in responses; per-agent-card search provider selector | |
@@ -458,6 +458,8 @@ TextAgent has undergone significant evolution since its inception. What started |
458 | 458 |
|
459 | 459 | | Date | Commits | Feature / Update | |
460 | 460 | |------|---------|-----------------| |
| 461 | +| **2026-03-12** | — | 🏷️ **@model Tag Field** — new `@model:` metadata field on all AI DocGen tag types (`{{@AI:}}`, `{{@Agent:}}`, `{{@Image:}}`, `{{@OCR:}}`, `{{@TTS:}}`, `{{@STT:}}`, `{{@Translate:}}`); persists selected model in document text for portability; intelligent defaults per tag type (OCR→`granite-docling`, TTS→`kokoro-tts`, STT→`voxtral-stt`, Image→`imagen-ultra`, AI/Agent→current model); dropdown shows all registered models, changing it syncs `@model:` back to editor; validated against `AI_MODELS` registry (invalid IDs silently ignored); fully backward-compatible with existing tags | |
| 462 | +| **2026-03-12** | — | 🔧 **Model Loading UX** — smart cache vs download detection in all 7 AI workers (📦 Loading from cache / ⬇️ Downloading from huggingface.co/textagent/...); source location display showing HuggingFace model path during loading; 🗑️ Delete Model button in consent dialog to clear Cache API + OPFS cached files and reset consent; `deleteModelCache()` function exposed on `M._ai`; workers forward Transformers.js `status` field (`initiate`/`progress`/`done`) with `source` and `loadingPhase`; new `.ai-progress-source` info bar and `.ai-consent-btn-danger` styling with dark mode | |
461 | 463 | | **2026-03-12** | — | 🎤 **STT Tag Block & Florence-2 & TTS Download** — new `{{@STT:...}}` tag block for in-preview speech-to-text dictation with engine selector (Whisper V3 Turbo / Voxtral Mini 3B / Web Speech API), 11-language picker, Record/Stop/Insert/Clear buttons, amber-accented CSS with recording pulse animation; Florence-2 (230M) vision OCR+captioning model added (`textagent/Florence-2-base-ft`); TTS ⬇ Save button with float32→WAV encoder for audio download; PDF-to-image OCR renderer via pdf.js (2x scale, max 3 pages); Granite Docling migrated to `textagent/` with `onnx-community/` fallback, fp16 embed_tokens, degeneration loop guard, raw base64→data URL fix; Qwen3 AutoTokenizer fix for text-only models; OCR mode forwarding to doc-model workers | |
462 | 464 | | **2026-03-12** | — | 🧪 **Comprehensive Test Suite** — 12 new Playwright spec files (108 tests) across 5 categories targeting past 3 days of code changes: **Functional** — unit tests for video player (URL detection, HTML builders, embed grid), TTS engine (API surface, state), speech commands (DOM elements, language selector), file converters (MD/CSV/JSON/XML/HTML import), stock widget (rendering, sandbox, double-render prevention); integration tests for embed grid pipeline and AI_MODELS registry. **Regression** — 12 tests pinning recent bug fixes (file upload crash, template confirmation, stock variable, embed rendering, mermaid stability, dark mode, XSS). **Performance** — module init timing (TTS/STT/video/stock/converter < 5–8s), complex render < 5s, embed grid < 3s. **Static Analysis** — ESLint, file size < 100KB, debugger/eval detection, CSS !important audit, IIFE patterns, worker files, HTTPS enforcement. **Security** — embed grid XSS (javascript:/data: URI), video player HTML escaping, YouTube privacy mode, TradingView sandbox, Vimeo DNT, link security, CSP validation. Total test count: 299 | |
463 | 465 | | **2026-03-12** | — | 🎤 **Voxtral STT** — [Voxtral Mini 3B](https://huggingface.co/textagent/Voxtral-Mini-3B-2507-ONNX) as primary speech-to-text engine on WebGPU (~2.7 GB, q4, 13 languages, streaming partial output via `TextStreamer`); Whisper Large V3 Turbo as WASM fallback (~800 MB, q8); `voxtral-worker.js` new WebWorker with `VoxtralForConditionalGeneration` + `VoxtralProcessor`; `speechToText.js` WebGPU detection + dual-worker routing; download consent popup (`showSttConsentPopup`) with model name/size/privacy info before first download; `STT_CONSENTED` localStorage key; model duplicated to `textagent/` HuggingFace org with `onnx-community/` fallback | |
|
0 commit comments