Skip to content

feat: Gemma 4 on-device inference via LiteRT-LM (Android)#4

Open
sparkleMing wants to merge 2 commits intomainfrom
feat/gemma4-litert-lm
Open

feat: Gemma 4 on-device inference via LiteRT-LM (Android)#4
sparkleMing wants to merge 2 commits intomainfrom
feat/gemma4-litert-lm

Conversation

@sparkleMing
Copy link
Copy Markdown
Collaborator

Summary

Add support for running Gemma 4 models fully on-device on Android using the official LiteRT-LM Kotlin API.

What's new

Android native layer

  • LiteRtLmPlugin.kt: Flutter platform channel plugin wrapping LiteRT-LM. Supports queue-based inference, streaming tokens, tool calls, thinking mode, model download via OkHttp, and M4A→PCM WAV audio conversion via MediaCodec.

Dart layer

  • GemmaLocalClient: LLMClient implementation using platform channels. Acquires a global inference lock before each request.
  • GemmaModelManager: Engine lifecycle manager. Vision/audio backends enabled strictly on demand — only when the request contains image/audio content. Engine fully torn down and rebuilt when backend config changes. Rebuild always happens after acquiring the lock to prevent teardown during active inference.

Provider integration

  • New typeGemmaLocal provider type with model list (gemma-4-e2b, gemma-4-e4b)
  • Model download UI in setup and settings pages (Android only)
  • No API key or base URL required; LLM data sharing consent skipped for on-device models

Other fixes

  • asset_analysis_tool: Gemma 4 uses JPEG + 896px max side to avoid LiteRT-LM patch count overflow. Non-Gemma path unchanged.
  • pkm_skill / timeline_card_skill: Use state.metadata factId as fallback when model-provided fact_id is unreliable.

Dependency upgrades

  • drift 2.30 to 2.32.1, sqlite3_flutter_libs 0.5 to 0.6, drift_flutter 0.2 to 0.3

- Add LiteRtLmPlugin (Kotlin) wrapping official LiteRT-LM API with
  queue-based inference, download support, and audio PCM conversion
- Add GemmaLocalClient (Dart) with per-request engine init/teardown
- Add GemmaModelManager with on-demand backend init: vision/audio
  backends only enabled when request contains image/audio content,
  engine rebuilt (with full teardown) when config changes
- Engine rebuild happens after acquiring inference lock to prevent
  teardown while another inference is in progress
- Add gemma_local provider type with model list (gemma-4-e2b/e4b)
- Add download UI in model config pages (Android only)
- Skip LLM data sharing consent for on-device models
- asset_analysis_tool: use JPEG + 896px cap for Gemma 4 to avoid
  LiteRT-LM patch count overflow; non-Gemma path unchanged
- pkm_skill/timeline_card_skill: use state factId as fallback when
  model-provided fact_id is unreliable
- Upgrade drift 2.30→2.32.1, sqlite3_flutter_libs 0.5→0.6,
  drift_flutter 0.2→0.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant