Releases: jegly/OfflineLLM
5.0.0
v5.0.0
New Features
- Translator mode — new "Translator" system prompt with source/target language dropdowns
(75+ languages now supported !!) - LaTeX math hints — toggle that instructs the model to use
$...$ /$$...$$ notation for
math expressions - Markdown rendering — assistant messages now rendered as Markdown (bold, italic, code
blocks, lists) via the mikepenz multiplatform-markdown-renderer library - Smart auto-scroll — dragging up while generating pauses auto-scroll; scrolling back to
the bottom re-enables it
Bug Fixes
- Model unload race condition — unloadModel() is now a suspend function that signals the
native side to stop, then waits for the generation coroutine to fully exit before freeing
native memory (5 s safeguard timeout) - Partial response lost on navigation — switching conversations or starting a new chat now
saves any in-progress partial response before unloading - Custom system prompt not persisting — the custom prompt field now correctly restores its
value when re-opening Settings - Misc bugs patched
Performance
- Token batching — UI updates batched every 3 tokens, reducing Compose recompositions
without visible latency - Stop-sequence scan optimized — only the last 200 characters are scanned per token instead
of the full response string - Thread count auto-detected — defaults to availableProcessors() / 2 (clamped 4–8) instead
of a hardcoded 4
Settings / UX
- Default max tokens raised from 512 → 2048
- Added 3000 / 3500 / 4000 quick-select buttons for max tokens
- isShrinkResources = true enabled for release builds
Build / Dependencies
- AGP 9.1.1, Kotlin 2.3.20, Compose BOM 2026.03.00
- Migrated from kotlinOptions { jvmTarget } to kotlin { compilerOptions { jvmTarget } } DSL
- Dropped kotlin-android plugin (redundant with new compose plugin)
- security-crypto, biometric, Room, coroutines, core-ktx, serialization all bumped to
stable releases - CMake ninja path auto-resolved from local.properties / env vars
- Target/compile SDK bumped to 37
- versionCode = 5
OfflineLLM-v4.0.0
Bundled model is now 0.8B parameters.
new security features :
Screenshot Protection
Tapjacking Protection
Accessibility Data Sensitivity
Auto-Lock on Background
Hardware-Backed Keystore for Keys
OfflineLLM-v3.0.0
OfflineLLM now supports Gemma4 models in GGUF format.
Minor bug patches and optimizations applied.
OfflineLLM v2.0.0
OfflineLLM v2.0.0
A fully offline, private AI chat app for Android. All inference runs on-device via llama.cpp. Zero network permissions.
What's New in v2.0.0
- Advanced Sampling Parameters — Full control over Temperature, Top-P, Top-K, Min-P, and Repeat Penalty with slider UI and plain-English explanations
- Context Size Slider — Adjustable from 512 to 16384 tokens
- Text-to-Speech — Read AI responses aloud (speaker icon on assistant messages)
- Chat Search — Search messages within conversations
- Delete Individual Messages — Long-press any message to delete
- Auto-Title Conversations — Chat titles set automatically from your first message
- Theme Selector — System Default / Light / Dark / AMOLED Black
- Accent Colour Picker — 9 colour options
- Thinking Tag Stripping — Hides blocks from reasoning models
- Empty Response Fix — No more blank message bubbles
- Help Screen — Built-in guide for downloading models from HuggingFace
- About Screen — Version info, license, links
Downloads
- OfflineLLM-v2.0.0-release.apk — Install directly on any Android 14+ device
- gemma-3-270m-it-Q4_K_M.gguf — Bundled model, fast on 4GB RAM devices (~300MB)
Install
- Download the APK and (optionally) a model file
- Enable "Install unknown apps" in Android settings
- Install the APK, complete onboarding
- Import the GGUF model from Settings → Import GGUF Model
Recommended Models
| Model | Size | Best For |
|---|---|---|
| Gemma 3 270M Q4_K_M | ~300 MB | 4GB RAM, fast responses |
| Qwen3.5 0.8B Q4_K_M | ~530 MB | 4-6GB RAM, good balance |
| Gemma 3 1B Q4_K_M | ~750 MB | 6-8GB RAM |
| Qwen3.5 4B Q4_K_M | ~2.5 GB | 8GB+ RAM, best quality |
OfflineLLM
OfflineLLM v1.0.0 — Initial Release
A fully offline, private AI chat app for Android. All LLM inference runs entirely on-device via llama.cpp. No internet permissions. No cloud. No tracking. - data never leaves the users device.
Features:
- On-device inference with optimized ARM NEON/SVE/i8mm native libraries
- Streaming token-by-token response display
- Import any GGUF model at runtime via file picker
- Multiple conversations with auto-titling and rename
- Chat search and individual message deletion
- Theme selector (System/Light/Dark/AMOLED Black)
- Accent colour picker with 9 colour options
- Configurable system prompts (General, Coder, Creative Writer, Tutor, Custom)
- Temperature, max tokens, and context size controls
- Optional thinking tag stripping for reasoning models
- Encrypted settings via Jetpack Security
- Optional biometric lock
- Chat export/import as JSON
- Built-in help guide for downloading models from HuggingFace
- Zero network permissions — verified in manifest
Recommended models:
- Gemma 3 270M (Q4_K_M) — Fast, works on 4GB RAM devices - included in this APK by default.
- Qwen3.5 0.8B (Q4_K_M) — Good balance for 4-6GB RAM
- Gemma 3 1B (Q4_K_M) — Recommended for 6-8GB RAM
- Qwen3.5 4B (Q4_K_M) — Best quality for 8GB+ RAM
Install: Enable Unknown Sources, then install the APK via file manager or adb install.
<3 JEGLY