Awesome Mobile LLMs

A curated list of LLMs and related studies targeted at mobile and embedded hardware

Last update: 4th April 2026

If your publication/work is not included - and you think it should - please open an issue or reach out directly to @stevelaskaridis.

Let's try to make this list as useful as possible to researchers, engineers and practitioners all around the world.

Mobile-First LLMs
Infrastructure / Deployment of LLMs on Device
Benchmarking LLMs on Device
Mobile-Specific Optimisations
Applications
Multimodal LLMs
Surveys on Efficient LLMs
Training LLMs on Device
Mobile-Related Use-cases
Benchmarks
Leaderboards
Books and Courses
Industry Announcements
Related Organized Workshops
Related Awesome Repositories

Mobile-First LLMs

The following Table shows sub-3B models designed for on-device deployments, sorted by year.

Name	Year	Sizes	Primary Group/Affiliation	Publication	Code Repository	HF Repository
2026
Gemma 4	2026	E2B, E4B, 26B, 31B	Google DeepMind	website	code	huggingface
LFM2.5	2026	350M, 1.2B, 1.5B, 1.6B	Liquid AI	website	-	huggingface
MobileLLM-Flash	2026	350M, 650M, 1.4B	Meta	paper	-	-
Qwen-3.5	2026	0.8B, 2B, ...	Qwen Team	blog	code	huggingface
2025
LFM2	2025	350M, 700M, 1.2B, 2.6B, 8.3B (1.5B active)	Liquid AI	paper, website	-	huggingface
MobileLLM-R1.5	2025	140M, 360M, 950M	Meta	paper	code	huggingface
Nemotron-Flash	2025	1B, 3B	Nvidia	paper, NeurIPS'25	-	huggingface
MobileLLM-Pro	2025	1B	Meta	paper	-	huggingface
MobileLLM-R1	2025	140M, 360M, 950M	Meta	paper	code	huggingface
SmolLM3	2025	3B	HuggingFace	blog	code	huggingface
Gemma 3	2025	1B, 4B, ...	Google DeepMind	paper	code	huggingface
Qwen-3	2025	0.6B, 1.7B, ...	Qwen Team	paper	code	huggingface
Pareto-Q	2025	125M, 350M, 600M, 1B, 1.5B, 3B	Meta	paper	code	huggingface
2024
BlueLM-V	2024	2.7B	CUHK, Vivo AI Lab	paper	code	-
PhoneLM	2024	0.5B, 1.5B	BUPT	paper	code	huggingface
AMD-Llama-135m	2024	135M	AMD	blog	code	huggingface
SmolLM2	2024	135M, 360M, 1.7B	Huggingface	-	code	huggingface
Ministral	2024	3B, ...	Mistral	blog	-	huggingface
Llama 3.2	2024	1B, 3B	Meta	blog	code	huggingface
OLMoE	2024	7B (1B active)	AllenAI	paper	code	huggingface
Spectra	2024	99M - 3.9B	NolanoAI	paper	code	huggingface
Gemma 2	2024	2B, ...	Google	paper blog	code	huggingface
Apple Intelligence Foundation LMs	2024	3B	Apple	paper	-	-
SmolLM	2024	135M, 360M, 1.7B	Huggingface	blog	-	huggingface
Fox	2024	1.6B	TensorOpera	blog	-	huggingface
Qwen2	2024	500M, 1.5B, ...	Qwen Team	paper	code	huggingface
OpenELM	2024	270M, 450M, 1.08B, 3.04B	Apple	paper	code	huggingface
DCLM	2024	400M, 1B, ...	Univerisy of Washington, Apple, Toyota Research Institute, ...	paper	code	huggingface
Phi-3	2024	3.8B	Microsoft	whitepaper	code	huggingface
BitNet-b1.58	2024	1.3B, 3B, ...	Microsoft	paper	code	huggingface
OLMo	2024	1B, ...	AllenAI	paper	code	huggingface
Mobile LLMs	2024	125M, 250M	Meta	paper, ICML'24	code	-
Gemma	2024	2B, ...	Google	paper, website	code, gemma.cpp	huggingface
MobiLlama	2024	0.5B, 1B	MBZUAI	paper	code	huggingface
Stable LM 2 (Zephyr)	2024	1.6B	Stability.ai	paper	-	huggingface
TinyLlama	2024	1.1B	Singapore University of Technology and Design	paper	code	huggingface
Gemini-Nano	2024	1.8B, 3.25B	Google	paper	-	-
2023
Stable LM (Zephyr)	2023	3B	Stability	blog	code	huggingface
OpenLM	2023	11M, 25M, 87M, 160M, 411M, 830M, 1B, 3B, ...	OpenLM team	-	code	huggingface
Phi-2	2023	2.7B	Microsoft	website	-	huggingface
Phi-1.5	2023	1.3B	Microsoft	paper	-	huggingface
Phi-1	2023	1.3B	Microsoft	paper	-	huggingface
RWKV	2023	169M, 430M, 1.5B, 3B, ...	EleutherAI	paper	code	huggingface
Cerebras-GPT	2023	111M, 256M, 590M, 1.3B, 2.7B ...	Cerebras	paper	code	huggingface
OPT	2022	125M, 350M, 1.3B, 2.7B, ...	Meta	paper	code	huggingface
LaMini-LM	2023	61M, 77M, 111M, 124M, 223M, 248M, 256M, 590M, 774M, 738M, 783M, 1.3B, 1.5B, ...	MBZUAI	paper	code	huggingface
Pythia	2023	70M, 160M, 410M, 1B, 1.4B, 2.8B, ...	EleutherAI	paper	code	huggingface
2022
Galactica	2022	125M, 1.3B, ...	Meta	paper	code	huggingface
BLOOM	2022	560M, 1.1B, 1.7B, 3B, ...	BigScience	paper	code	huggingface
2021
XGLM	2021	564M, 1.7B, 2.9B, ...	Meta	paper	code	huggingface
GPT-Neo	2021	125M, 350M, 1.3B, 2.7B	EleutherAI	-	code, gpt-neox	huggingface
2020
MobileBERT	2020	15.1M, 25.3M	CMU, Google	paper	code	huggingface
2019
BART	2019	140M, 400M	Meta	paper	code	huggingface
DistilBERT	2019	66M	HuggingFace	paper	code	huggingface
T5	2019	60M, 220M, 770M, 3B, ...	Google	paper	code	huggingface
TinyBERT	2019	14.5M	Huawei	paper	code	huggingface
Megatron-LM	2019	336M, 1.3B, ...	Nvidia	paper	code	-

Infrastructure / Deployment of LLMs on Device

This section showcases frameworks and contributions for supporting LLM inference on mobile and edge devices.

Deployment Frameworks

On-Device Inference Frameworks

These frameworks are primarily used to run models directly on-device, inside mobile apps, edge deployments, or tightly integrated local runtimes.

llama.cpp: Inference of Meta's LLaMA model (and others) in pure C/C++. Supports various platforms and builds on top of ggml (now gguf format).
- LLMFarm: iOS frontend for llama.cpp
- LLM.swift: iOS frontend for llama.cpp
- Sherpa: Android frontend for llama.cpp
- iAkashPaul/Portal: Wraps the example android app with tweaked UI, configs & additional model support
- dusty-nv's llama.cpp: Containers for Jetson deployment of llama.cpp
- Off Grid: Open-source React Native app for on-device LLM chat, vision models (SmolVLM, LLaVA), and Stable Diffusion image generation on iOS & Android.
- Airgap: Open-source React Native framework for on-device, offline-first customer support chatbots. Runs Gemma 4 E2B locally via llama.rn. Seven industry templates (telco, retail, healthcare, banking, education, insurance, airlines) ship in the repo.
MLC-LLM: MLC LLM is a machine learning compiler and high-performance deployment engine for large language models. Supports various platforms and build on top of TVM.
- Android App: MLC Android app
- iOS App: MLC iOS app
- dusty-nv's MLC: Containers for Jetson deployment of MLC
PyTorch ExecuTorch: Solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers.
- TorchChat: Codebase showcasing the ability to run large language models (LLMs) seamlessly across iOS and Android
Google MediaPipe: A suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications. Support Android, iOS, Python and Web.
- GoogleAI-Edge Gallery: Experimental app that puts the power of cutting-edge Generative AI models directly into your hands, running entirely on your Android and iOS devices.
Apple MLX: MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research. Builds upon lazy evaluation and unified memory architecture.
- MLX Swift: Swift API for MLX.

Apple Foundation Models SDK: Python bindings for Apple's Foundation Models framework, providing access to the on-device foundation model at the core of Apple Intelligence on macOS.

HF Swift Transformers: Swift Package to implement a transformers-like API in Swift
Alibaba MNN: MNN supports inference and training of deep learning models and for inference and training on-device.
llama2.c (More educational, see here for android port)
tinygrad: Simple neural network framework from tinycorp and @geohot
TinyChatEngine: Targeted at Nvidia, Apple M1 and RPi, from Song Han's (MIT) group.
Llama Stack (swift, kotlin): These libraries are a set of SDKs that provide a simple and effective way to integrate AI capabilities into your iOS/Android app, whether it is local (on-device) or remote inference.
OLMoE.Swift: Ai2 OLMoE is an AI chatbot powered by the OLMoE model. Unlike cloud-based AI assistants, OLMoE runs entirely on your device, ensuring complete privacy and offline accessibility—even in Flight Mode.
HuggingSnap: HuggingSnap is an iOS app that lets users quickly learn more about the places and objects around them. HuggingSnap runs SmolVLM2, a compact open multimodal model that accepts arbitrary sequences of image, videos, and text inputs to produce text outputs.
Flower Intelligence: Flower Intelligence is a cross-platform inference library that lets users seamlessly interact with Large-Language Models both locally and remotely in a secure and private way. The library was created by the Flower Labs team. It supports TypeScript, JavaScript and Swift backends.

Local Network Model Serving

These frameworks are primarily used to host models on a laptop, desktop, or workstation and expose them over a local API to other devices on the same LAN.

LM Studio: Desktop application and local inference server for hosting models on your machine, with an OpenAI-compatible local API.
Ollama: Local model runner and server for hosting and serving models through a simple CLI and HTTP API.
Lemonade: Open-source local AI server for text, image, and speech workloads, designed to run privately on local PCs and compatible with OpenAI-style APIs.
llama.cpp: Can also be used as a lightweight local inference server for hosting GGUF models via CLI and HTTP server modes.
LocalAI: Self-hosted local inference server and OpenAI-compatible REST API for running LLM, vision, image, and audio workloads on local or on-prem hardware.
Locally AI: Native Apple-platform app for running AI models fully offline on iPhone, iPad, and Mac, optimized for Apple Silicon and on-device privacy.
vLLM: High-throughput inference and serving engine that can expose OpenAI-compatible local APIs, better suited to stronger desktops and workstations.
SGLang: High-performance model serving framework for local and distributed deployments, designed for low-latency and high-throughput inference.

Papers

2025

Apple Intelligence Foundation Language Models: Tech Report 2025
Ethan Li, Anders Boesen Lindbo Larsen, Chen Zhang, et al.
[ACM Queue] Generative AI at the Edge: Challenges and Opportunities: The next phase in AI deployment
Vijay Janapa Reddi

2024

PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Zhenliang Xue, Yixin Song, Zeyu Mi, et al.
[MobiCom'24] Mobile Foundation Model as Firmware
Jinliang Yuan, Chen Yang, Dongqi Cai, et al.
Merino: Entropy-driven Design for Generative Language Models on IoT Devicess
Youpeng Zhao, Ming Lin, Huadong Tang, et al.
LLM as a System Service on Mobile Devices
Wangsong Yin, Mengwei Xu, Yuanchun Li, et al.

2023

LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices
Junchen Zhao, Yurun Song, Simeng Liu, et al.
LLMCad: Fast and Scalable On-device Large Language Model Inference
Daliang Xu, Wangsong Yin, Xin Jin, et al.
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Rongjie Yi, Liwei Guo, Shiyun Wei, et al.

2022

[IEEE Pervasive Computing] The Future of Consumer Edge-AI Computing
Stefanos Laskaridis, Stylianos I. Venieris, Alexandros Kouris, et al.

Benchmarking LLMs on Device

This section focuses on measurements and benchmarking efforts for assessing LLM performance when deployed on device.

Papers

2026

LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load
Pranay Tummalapalli, Sahil Arayakandy, Ritam Pal, Kautuk Kundan

2025

Intelligence Per Watt: Measuring Intelligence Efficiency of Local AI
Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, et al.
P/D-Device: Disaggregated Large Language Model between Cloud and Devices
Yibo Jin, Yixu Xu, Yue Chen, et al.
Sometimes Painful but Promising: Feasibility and Trade-Offs of On-Device Language Model Inference
Maximilian Abstreiter, Sasu Tarkoma, Roberto Morabito
[ICLR'25] PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms
Yilong Li, Jingyu Liu, Hao Zhang, et al.
[SEC'25] lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
Haoxin Wang, Xiaolong Tu, Hongyu Ke, et al.

2024

Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation
Jie Xiao, Qianyi Huang, Xu Chen, et al.
[EdgeFM @ MobiSys'24] Large Language Models on Mobile Devices: Measurements, Analysis, and Insights
Xiang Li, Zhenyan Lu, Dongqi Cai, et al.
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
Rithesh Murthy, Liangwei Yang, Juntao Tan, et al.
[MobiCom'24] MELTing point: Mobile Evaluation of Language Transformers
Stefanos Laskaridis, Kleomenis Katevas, Lorenzo Minto, et al.

Mobile-Specific Optimisations

This section focuses on techniques and optimisations that target mobile-specific deployment.

Papers

2025

[NeurIPS'25] Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Yonggan Fu, Xin Dong, Shizhe Diao, et al.
[MobiCom '25] Elastic On-Device LLM Service
Wangsong Yin, Rongjie Yi, Daliang Xu, et al.
[MobiCom '25] Confidant: Customizing Transformer-based LLMs via Collaborative Training on Mobile Devices
Yuhao Chen, Yuxuan Yan, Shuowei Ge, et al.
[MobiCom '25] D2MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
Haodong Wang, Qihua Zhou, Zicong Hong, et al.
[CVPR'25 EDGE Workshop] Scaling On-Device GPU Inference for Large Generative Models
Jiuqiang Tang, Raman Sarokin, Ekaterina Ignasheva, et al.
ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM
Liang Li, Xingke Yang, Wen Wu, et al.
[ASPLOS'25] Fast On-device LLM Inference with NPUs
Daliang Xu, Hao Zhang, Liming Yang, et al.

2024

Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
Andrii Skliar, Ties van Rozendaal, Romain Lepert, et al.
PhoneLM: An Efficient and Capable Small Language Model Family through Principled Pre-training
Rongjie Yi, Xiang Li, Weikai Xie, et al.
MobileQuant: Mobile-friendly Quantization for On-device Language Models
Fuwen Tan, Royson Lee, Łukasz Dudziak, et al.
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team, Morgane Riviere, Shreya Pathak, et al.
Apple Intelligence Foundation Language Models
Tom Gunter, Zirui Wang, Chong Wang, et al.
EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting
Zhongzhi Yu, Zheng Wang, Yuhan Li, et al.
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin, Jyoti Aneja, Hany Awadalla, et al.
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Luchang Li, Sheng Qian, Jie Lu, et al.
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team, Google DeepMind
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Omkar Thawakar, Ashmal Vayani, Salman Khan, et al.
[ICML'24] MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Zechun Liu, Changsheng Zhao, Forrest Iandola, et al.
[ICML'24] Rethinking Optimization and Architecture for Tiny Language Models
Yehui Tang, Kai Han, Fangcheng Liu, et al.
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang, Guangtao Zeng, Tianduo Wang, et al.

Applications

Papers

2024

Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent
Wei Chen, Zhiyuan Li
Octopus v2: On-device language model for super agent
Wei Chen, Zhiyuan Li
Octopus: On-device language model for function calling of software APIs
Wei Chen, Zhiyuan Li, Mingyuan Ma

2023

Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile
Samuel Carreira, Tomas Marques, Jose Ribeiro, Carlos Grilo
Towards an On-device Agent for Text Rewriting
Yun Zhu, Yinxiao Liu, Felix Stahlberg, et al.

Multimodal LLMs

This section refers to multimodal LLMs, which integrate vision or other modalities in their tasks.

Papers

2024

[CVPR 2024] MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Vasu, Pavan Kumar Anasosalu, Pouransari, Hadi, Faghri, Fartash, et al.
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Baichuan Zhou, Ying Hu, Xi Weng, et al.
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, et al.

2023

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu, Limeng Qiao, Xinyang Lin, et al.

Surveys on Efficient LLMs

This section includes survey papers on LLM efficiency, a topic very much related to deploying in constrained devices.

Papers

2025

GenAI at the Edge: Comprehensive Survey on Empowering Edge Devices
Mozhgan Navardi, Romina Aalishah, Yuzhe Fu, et al.
Demystifying Small Language Models for Edge Deployment
Zhenyan Lu, Xiang Li, Dongqi Cai, et al.
Small Language Models (SLMs) Can Still Pack a Punch: A survey
Shreyas Subramanian, Vikram Elango, Mecit Gungor

2024

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness
Fali Wang, Zhiwei Zhang, Xianren Zhang, et al.
Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu, Xiang Li, Dongqi Cai, et al.
On-Device Language Models: A Comprehensive Review
Jiajun Xu, Zhiyuan Li, Wei Chen, et al.
A Survey of Resource-efficient LLM and Multimodal Foundation Models
Mengwei Xu, Wangsong Yin, Dongqi Cai, et al.

2023

Efficient Large Language Models: A Survey
Zhongwei Wan, Xin Wang, Che Liu, et al.
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, et al.
A Survey on Model Compression for Large Language Models
Xunyu Zhu, Jian Li, Yong Liu, et al.

Training LLMs on Device

This section refers to papers attempting to train/fine-tune LLMs on device, in a standalone or federated manner.

Papers

2025

Computational Bottlenecks of Training Small-scale Large Language Models
Saleh Ashkboos, Iman Mirzadeh, Keivan Alizadeh, et al.
[ICML'25] On-device collaborative language modeling via a mixture of generalists and specialists
Dongyang Fan, Bettina Messmer, Nikita Doikov, et al.
MobiLLM: Enabling LLM Fine-Tuning on the Mobile Device via Server Assisted Side Tuning
Liang Li, Xingke Yang, Wen Wu, et al.

2024

[Privacy in Natural Language Processing @ ACL'24] PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs
Dan Peng, Zhihui Fu

2023

[MobiCom'23] Federated Few-Shot Learning for Mobile NLP
Dongqi Cai, Shangguang Wang, Yaozong Wu, et al.
FwdLLM: Efficient FedLLM using Forward Gradient
Mengwei Xu, Dongqi Cai, Yaozong Wu, et al.
[Electronics'24] Forward Learning of Large Language Models by Consumer Devices
Danilo Pietro Pau, Fabrizio Maria Aymone
Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly
Herbert Woisetschläger, Alexander Isenko, Shiqiang Wang, et al.
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
Zhen Qin, Daoyuan Chen, Bingchen Qian, et al.

Mobile-Related Use-cases

This section includes paper that are mobile-related, but not necessarily run on device.

Papers

2025

Slm-mux: Orchestrating small language models for reasoning
Chenyu Wang, Zishen Wan, Hao Kang, et al.
Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Zhen Yang, Zi-Yi Dou, Di Feng, et al.
[NeurIPS'25] OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding
Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Shaojie Zhuo, et al.
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Xuechen Zhang, Zijian Huang, Chenshun Ni, et al.
Small Language Models are the Future of Agentic AI
Peter Belcak, Greg Heinrich, Shizhe Diao, et al.

2024

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Junyang Wang, Haiyang Xu, Haitao Jia, et al.
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You, Haotian Zhang, Eldon Schoop, et al.
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Junyang Wang, Haiyang Xu, Jiabo Ye, et al.
[MobiCom'24] MobileGPT: Augmenting LLM with Human-like App Memory for Mobile Task Automation
Sunjae Lee, Junyoung Choi, Jungjae Lee, et al.
[MobiCom'24] AutoDroid: LLM-powered Task Automation in Android
Hao Wen, Yuanchun Li, Guohong Liu, et al.

2023

[NeurIPS'23] AndroidInTheWild: A Large-Scale Dataset For Android Device Control
Christopher Rawles, Alice Li, Daniel Rodriguez, et al.
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
An Yan, Zhengyuan Yang, Wanrong Zhu, et al.

Older

[ACL'20] Mapping Natural Language Instructions to Mobile UI Action Sequences
Yang Li, Jiacong He, Xin Zhou, et al.

Benchmarks

Leaderboards

Books and Courses

Edge AI Engineering by Marcelo Rovai
Machine Learning Systems: Principles and Practices of Engineering Artificially Intelligent Systems by Vijay Janapa Reddi

Industry Announcements

Related Organized Workshops

TTODLer-FM @ ICML'25: Tiny Titans: The next wave of On-Device Learning for Foundational Models (TTODLer-FM)
ES-FoMO @ ICML'25: Efficient Systems for Foundation Models
Binary Networks @ ICCV'25: Binary and Extreme Quantization for Computer Vision
SLLM @ ICLR'25: Workshop on Sparsity in LLMs: Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference
MCDC @ ICLR'25: Workshop on Modularity for Collaborative, Decentralized, and Continual Deep Learning
Adaptive Foundation Models @ NeurIPS'24

Related Awesome Repositories

If you want to read more about related topics, here are some tangential awesome repositories to visit:

NexaAI/Awesome-LLMs-on-device on LLMs on Device
FairyFali/SLMs-Survey on Small Language Models
Hannibal046/Awesome-LLM on Large Language Models
KennethanCeyer/awesome-llm on Large Language Models
HuangOwen/Awesome-LLM-Compression on Large Language Model Compression
csarron/awesome-emdl on Embedded and Mobile Deep Learning

Contribute

Contributions welcome! Read the contribution guidelines first.

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
code-of-conduct.md		code-of-conduct.md
contributing.md		contributing.md

Folders and files

Latest commit

History

Repository files navigation

Awesome Mobile LLMs

Contents

Mobile-First LLMs

Infrastructure / Deployment of LLMs on Device

Deployment Frameworks

On-Device Inference Frameworks

Local Network Model Serving

Papers

2025

2024

2023

2022

Benchmarking LLMs on Device

Papers

2026

2025

2024

Mobile-Specific Optimisations

Papers

2025

2024

Applications

Papers

2024

2023

Multimodal LLMs

Papers

2024

2023

Surveys on Efficient LLMs

Papers

2025

2024

2023

Training LLMs on Device

Papers

2025

2024

2023

Mobile-Related Use-cases

Papers

2025

2024

2023

Older

Benchmarks

Leaderboards

Books and Courses

Industry Announcements

Related Organized Workshops

Related Awesome Repositories

Contribute

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages