A curated list of LLMs and related studies targeted at mobile and embedded hardware
Last update: 4th April 2026
If your publication/work is not included - and you think it should - please open an issue or reach out directly to @stevelaskaridis.
Let's try to make this list as useful as possible to researchers, engineers and practitioners all around the world.
- Mobile-First LLMs
- Infrastructure / Deployment of LLMs on Device
- Benchmarking LLMs on Device
- Mobile-Specific Optimisations
- Applications
- Multimodal LLMs
- Surveys on Efficient LLMs
- Training LLMs on Device
- Mobile-Related Use-cases
- Benchmarks
- Leaderboards
- Books and Courses
- Industry Announcements
- Related Organized Workshops
- Related Awesome Repositories
The following Table shows sub-3B models designed for on-device deployments, sorted by year.
| Name | Year | Sizes | Primary Group/Affiliation | Publication | Code Repository | HF Repository |
|---|---|---|---|---|---|---|
| 2026 | ||||||
| Gemma 4 | 2026 | E2B, E4B, 26B, 31B | Google DeepMind | website | code | huggingface |
| LFM2.5 | 2026 | 350M, 1.2B, 1.5B, 1.6B | Liquid AI | website | - | huggingface |
| MobileLLM-Flash | 2026 | 350M, 650M, 1.4B | Meta | paper | - | - |
| Qwen-3.5 | 2026 | 0.8B, 2B, ... | Qwen Team | blog | code | huggingface |
| 2025 | ||||||
| LFM2 | 2025 | 350M, 700M, 1.2B, 2.6B, 8.3B (1.5B active) | Liquid AI | paper, website | - | huggingface |
| MobileLLM-R1.5 | 2025 | 140M, 360M, 950M | Meta | paper | code | huggingface |
| Nemotron-Flash | 2025 | 1B, 3B | Nvidia | paper, NeurIPS'25 | - | huggingface |
| MobileLLM-Pro | 2025 | 1B | Meta | paper | - | huggingface |
| MobileLLM-R1 | 2025 | 140M, 360M, 950M | Meta | paper | code | huggingface |
| SmolLM3 | 2025 | 3B | HuggingFace | blog | code | huggingface |
| Gemma 3 | 2025 | 1B, 4B, ... | Google DeepMind | paper | code | huggingface |
| Qwen-3 | 2025 | 0.6B, 1.7B, ... | Qwen Team | paper | code | huggingface |
| Pareto-Q | 2025 | 125M, 350M, 600M, 1B, 1.5B, 3B | Meta | paper | code | huggingface |
| 2024 | ||||||
| BlueLM-V | 2024 | 2.7B | CUHK, Vivo AI Lab | paper | code | - |
| PhoneLM | 2024 | 0.5B, 1.5B | BUPT | paper | code | huggingface |
| AMD-Llama-135m | 2024 | 135M | AMD | blog | code | huggingface |
| SmolLM2 | 2024 | 135M, 360M, 1.7B | Huggingface | - | code | huggingface |
| Ministral | 2024 | 3B, ... | Mistral | blog | - | huggingface |
| Llama 3.2 | 2024 | 1B, 3B | Meta | blog | code | huggingface |
| OLMoE | 2024 | 7B (1B active) | AllenAI | paper | code | huggingface |
| Spectra | 2024 | 99M - 3.9B | NolanoAI | paper | code | huggingface |
| Gemma 2 | 2024 | 2B, ... | paper blog | code | huggingface | |
| Apple Intelligence Foundation LMs | 2024 | 3B | Apple | paper | - | - |
| SmolLM | 2024 | 135M, 360M, 1.7B | Huggingface | blog | - | huggingface |
| Fox | 2024 | 1.6B | TensorOpera | blog | - | huggingface |
| Qwen2 | 2024 | 500M, 1.5B, ... | Qwen Team | paper | code | huggingface |
| OpenELM | 2024 | 270M, 450M, 1.08B, 3.04B | Apple | paper | code | huggingface |
| DCLM | 2024 | 400M, 1B, ... | Univerisy of Washington, Apple, Toyota Research Institute, ... | paper | code | huggingface |
| Phi-3 | 2024 | 3.8B | Microsoft | whitepaper | code | huggingface |
| BitNet-b1.58 | 2024 | 1.3B, 3B, ... | Microsoft | paper | code | huggingface |
| OLMo | 2024 | 1B, ... | AllenAI | paper | code | huggingface |
| Mobile LLMs | 2024 | 125M, 250M | Meta | paper, ICML'24 | code | - |
| Gemma | 2024 | 2B, ... | paper, website | code, gemma.cpp | huggingface | |
| MobiLlama | 2024 | 0.5B, 1B | MBZUAI | paper | code | huggingface |
| Stable LM 2 (Zephyr) | 2024 | 1.6B | Stability.ai | paper | - | huggingface |
| TinyLlama | 2024 | 1.1B | Singapore University of Technology and Design | paper | code | huggingface |
| Gemini-Nano | 2024 | 1.8B, 3.25B | paper | - | - | |
| 2023 | ||||||
| Stable LM (Zephyr) | 2023 | 3B | Stability | blog | code | huggingface |
| OpenLM | 2023 | 11M, 25M, 87M, 160M, 411M, 830M, 1B, 3B, ... | OpenLM team | - | code | huggingface |
| Phi-2 | 2023 | 2.7B | Microsoft | website | - | huggingface |
| Phi-1.5 | 2023 | 1.3B | Microsoft | paper | - | huggingface |
| Phi-1 | 2023 | 1.3B | Microsoft | paper | - | huggingface |
| RWKV | 2023 | 169M, 430M, 1.5B, 3B, ... | EleutherAI | paper | code | huggingface |
| Cerebras-GPT | 2023 | 111M, 256M, 590M, 1.3B, 2.7B ... | Cerebras | paper | code | huggingface |
| OPT | 2022 | 125M, 350M, 1.3B, 2.7B, ... | Meta | paper | code | huggingface |
| LaMini-LM | 2023 | 61M, 77M, 111M, 124M, 223M, 248M, 256M, 590M, 774M, 738M, 783M, 1.3B, 1.5B, ... | MBZUAI | paper | code | huggingface |
| Pythia | 2023 | 70M, 160M, 410M, 1B, 1.4B, 2.8B, ... | EleutherAI | paper | code | huggingface |
| 2022 | ||||||
| Galactica | 2022 | 125M, 1.3B, ... | Meta | paper | code | huggingface |
| BLOOM | 2022 | 560M, 1.1B, 1.7B, 3B, ... | BigScience | paper | code | huggingface |
| 2021 | ||||||
| XGLM | 2021 | 564M, 1.7B, 2.9B, ... | Meta | paper | code | huggingface |
| GPT-Neo | 2021 | 125M, 350M, 1.3B, 2.7B | EleutherAI | - | code, gpt-neox | huggingface |
| 2020 | ||||||
| MobileBERT | 2020 | 15.1M, 25.3M | CMU, Google | paper | code | huggingface |
| 2019 | ||||||
| BART | 2019 | 140M, 400M | Meta | paper | code | huggingface |
| DistilBERT | 2019 | 66M | HuggingFace | paper | code | huggingface |
| T5 | 2019 | 60M, 220M, 770M, 3B, ... | paper | code | huggingface | |
| TinyBERT | 2019 | 14.5M | Huawei | paper | code | huggingface |
| Megatron-LM | 2019 | 336M, 1.3B, ... | Nvidia | paper | code | - |
This section showcases frameworks and contributions for supporting LLM inference on mobile and edge devices.
These frameworks are primarily used to run models directly on-device, inside mobile apps, edge deployments, or tightly integrated local runtimes.
- llama.cpp: Inference of Meta's LLaMA model (and others) in pure C/C++. Supports various platforms and builds on top of ggml (now gguf format).
- LLMFarm: iOS frontend for llama.cpp
- LLM.swift: iOS frontend for llama.cpp
- Sherpa: Android frontend for llama.cpp
- iAkashPaul/Portal: Wraps the example android app with tweaked UI, configs & additional model support
- dusty-nv's llama.cpp: Containers for Jetson deployment of llama.cpp
- Off Grid: Open-source React Native app for on-device LLM chat, vision models (SmolVLM, LLaVA), and Stable Diffusion image generation on iOS & Android.
- Airgap: Open-source React Native framework for on-device, offline-first customer support chatbots. Runs Gemma 4 E2B locally via llama.rn. Seven industry templates (telco, retail, healthcare, banking, education, insurance, airlines) ship in the repo.
- MLC-LLM: MLC LLM is a machine learning compiler and high-performance deployment engine for large language models. Supports various platforms and build on top of TVM.
- Android App: MLC Android app
- iOS App: MLC iOS app
- dusty-nv's MLC: Containers for Jetson deployment of MLC
- PyTorch ExecuTorch: Solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers.
- TorchChat: Codebase showcasing the ability to run large language models (LLMs) seamlessly across iOS and Android
- Google MediaPipe: A suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications. Support Android, iOS, Python and Web.
- GoogleAI-Edge Gallery: Experimental app that puts the power of cutting-edge Generative AI models directly into your hands, running entirely on your Android and iOS devices.
- Apple MLX: MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research. Builds upon lazy evaluation and unified memory architecture.
- MLX Swift: Swift API for MLX.
- Apple Foundation Models SDK: Python bindings for Apple's Foundation Models framework, providing access to the on-device foundation model at the core of Apple Intelligence on macOS.
- HF Swift Transformers: Swift Package to implement a transformers-like API in Swift
- Alibaba MNN: MNN supports inference and training of deep learning models and for inference and training on-device.
- llama2.c (More educational, see here for android port)
- tinygrad: Simple neural network framework from tinycorp and @geohot
- TinyChatEngine: Targeted at Nvidia, Apple M1 and RPi, from Song Han's (MIT) group.
- Llama Stack (swift, kotlin): These libraries are a set of SDKs that provide a simple and effective way to integrate AI capabilities into your iOS/Android app, whether it is local (on-device) or remote inference.
- OLMoE.Swift: Ai2 OLMoE is an AI chatbot powered by the OLMoE model. Unlike cloud-based AI assistants, OLMoE runs entirely on your device, ensuring complete privacy and offline accessibility—even in Flight Mode.
- HuggingSnap: HuggingSnap is an iOS app that lets users quickly learn more about the places and objects around them. HuggingSnap runs SmolVLM2, a compact open multimodal model that accepts arbitrary sequences of image, videos, and text inputs to produce text outputs.
- Flower Intelligence: Flower Intelligence is a cross-platform inference library that lets users seamlessly interact with Large-Language Models both locally and remotely in a secure and private way. The library was created by the Flower Labs team. It supports TypeScript, JavaScript and Swift backends.
These frameworks are primarily used to host models on a laptop, desktop, or workstation and expose them over a local API to other devices on the same LAN.
- LM Studio: Desktop application and local inference server for hosting models on your machine, with an OpenAI-compatible local API.
- Ollama: Local model runner and server for hosting and serving models through a simple CLI and HTTP API.
- Lemonade: Open-source local AI server for text, image, and speech workloads, designed to run privately on local PCs and compatible with OpenAI-style APIs.
- llama.cpp: Can also be used as a lightweight local inference server for hosting GGUF models via CLI and HTTP server modes.
- LocalAI: Self-hosted local inference server and OpenAI-compatible REST API for running LLM, vision, image, and audio workloads on local or on-prem hardware.
- Locally AI: Native Apple-platform app for running AI models fully offline on iPhone, iPad, and Mac, optimized for Apple Silicon and on-device privacy.
- vLLM: High-throughput inference and serving engine that can expose OpenAI-compatible local APIs, better suited to stronger desktops and workstations.
- SGLang: High-performance model serving framework for local and distributed deployments, designed for low-latency and high-throughput inference.
- Apple Intelligence Foundation Language Models: Tech Report 2025
Ethan Li, Anders Boesen Lindbo Larsen, Chen Zhang, et al. - [ACM Queue] Generative AI at the Edge: Challenges and Opportunities: The next phase in AI deployment
Vijay Janapa Reddi
- PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Zhenliang Xue, Yixin Song, Zeyu Mi, et al. - [MobiCom'24] Mobile Foundation Model as Firmware
Jinliang Yuan, Chen Yang, Dongqi Cai, et al. - Merino: Entropy-driven Design for Generative Language Models on IoT Devicess
Youpeng Zhao, Ming Lin, Huadong Tang, et al. - LLM as a System Service on Mobile Devices
Wangsong Yin, Mengwei Xu, Yuanchun Li, et al.
- LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices
Junchen Zhao, Yurun Song, Simeng Liu, et al. - LLMCad: Fast and Scalable On-device Large Language Model Inference
Daliang Xu, Wangsong Yin, Xin Jin, et al. - EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Rongjie Yi, Liwei Guo, Shiyun Wei, et al.
- [IEEE Pervasive Computing] The Future of Consumer Edge-AI Computing
Stefanos Laskaridis, Stylianos I. Venieris, Alexandros Kouris, et al.
This section focuses on measurements and benchmarking efforts for assessing LLM performance when deployed on device.
- LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load
Pranay Tummalapalli, Sahil Arayakandy, Ritam Pal, Kautuk Kundan
- Intelligence Per Watt: Measuring Intelligence Efficiency of Local AI
Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, et al. - P/D-Device: Disaggregated Large Language Model between Cloud and Devices
Yibo Jin, Yixu Xu, Yue Chen, et al. - Sometimes Painful but Promising: Feasibility and Trade-Offs of On-Device Language Model Inference
Maximilian Abstreiter, Sasu Tarkoma, Roberto Morabito - [ICLR'25] PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms
Yilong Li, Jingyu Liu, Hao Zhang, et al. - [SEC'25] lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
Haoxin Wang, Xiaolong Tu, Hongyu Ke, et al.
- Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation
Jie Xiao, Qianyi Huang, Xu Chen, et al. - [EdgeFM @ MobiSys'24] Large Language Models on Mobile Devices: Measurements, Analysis, and Insights
Xiang Li, Zhenyan Lu, Dongqi Cai, et al. - MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
Rithesh Murthy, Liangwei Yang, Juntao Tan, et al. - [MobiCom'24] MELTing point: Mobile Evaluation of Language Transformers
Stefanos Laskaridis, Kleomenis Katevas, Lorenzo Minto, et al.
This section focuses on techniques and optimisations that target mobile-specific deployment.
- [NeurIPS'25] Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Yonggan Fu, Xin Dong, Shizhe Diao, et al. - [MobiCom '25] Elastic On-Device LLM Service
Wangsong Yin, Rongjie Yi, Daliang Xu, et al. - [MobiCom '25] Confidant: Customizing Transformer-based LLMs via Collaborative Training on Mobile Devices
Yuhao Chen, Yuxuan Yan, Shuowei Ge, et al. - [MobiCom '25] D2MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
Haodong Wang, Qihua Zhou, Zicong Hong, et al. - [CVPR'25 EDGE Workshop] Scaling On-Device GPU Inference for Large Generative Models
Jiuqiang Tang, Raman Sarokin, Ekaterina Ignasheva, et al. - ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM
Liang Li, Xingke Yang, Wen Wu, et al. - [ASPLOS'25] Fast On-device LLM Inference with NPUs
Daliang Xu, Hao Zhang, Liming Yang, et al.
- Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
Andrii Skliar, Ties van Rozendaal, Romain Lepert, et al. - PhoneLM: An Efficient and Capable Small Language Model Family through Principled Pre-training
Rongjie Yi, Xiang Li, Weikai Xie, et al. - MobileQuant: Mobile-friendly Quantization for On-device Language Models
Fuwen Tan, Royson Lee, Łukasz Dudziak, et al. - Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team, Morgane Riviere, Shreya Pathak, et al. - Apple Intelligence Foundation Language Models
Tom Gunter, Zirui Wang, Chong Wang, et al. - EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting
Zhongzhi Yu, Zheng Wang, Yuhan Li, et al. - Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin, Jyoti Aneja, Hany Awadalla, et al. - Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Luchang Li, Sheng Qian, Jie Lu, et al. - Gemma: Open Models Based on Gemini Research and Technology
Gemma Team, Google DeepMind - MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Omkar Thawakar, Ashmal Vayani, Salman Khan, et al. - [ICML'24] MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Zechun Liu, Changsheng Zhao, Forrest Iandola, et al. - [ICML'24] Rethinking Optimization and Architecture for Tiny Language Models
Yehui Tang, Kai Han, Fangcheng Liu, et al. - TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang, Guangtao Zeng, Tianduo Wang, et al.
- Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent
Wei Chen, Zhiyuan Li - Octopus v2: On-device language model for super agent
Wei Chen, Zhiyuan Li - Octopus: On-device language model for function calling of software APIs
Wei Chen, Zhiyuan Li, Mingyuan Ma
- Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile
Samuel Carreira, Tomas Marques, Jose Ribeiro, Carlos Grilo - Towards an On-device Agent for Text Rewriting
Yun Zhu, Yinxiao Liu, Felix Stahlberg, et al.
This section refers to multimodal LLMs, which integrate vision or other modalities in their tasks.
- [CVPR 2024] MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Vasu, Pavan Kumar Anasosalu, Pouransari, Hadi, Faghri, Fartash, et al. - TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Baichuan Zhou, Ying Hu, Xi Weng, et al. - MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, et al.
- MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu, Limeng Qiao, Xinyang Lin, et al.
This section includes survey papers on LLM efficiency, a topic very much related to deploying in constrained devices.
- GenAI at the Edge: Comprehensive Survey on Empowering Edge Devices
Mozhgan Navardi, Romina Aalishah, Yuzhe Fu, et al. - Demystifying Small Language Models for Edge Deployment
Zhenyan Lu, Xiang Li, Dongqi Cai, et al. - Small Language Models (SLMs) Can Still Pack a Punch: A survey
Shreyas Subramanian, Vikram Elango, Mecit Gungor
- A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness
Fali Wang, Zhiwei Zhang, Xianren Zhang, et al. - Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu, Xiang Li, Dongqi Cai, et al. - On-Device Language Models: A Comprehensive Review
Jiajun Xu, Zhiyuan Li, Wei Chen, et al. - A Survey of Resource-efficient LLM and Multimodal Foundation Models
Mengwei Xu, Wangsong Yin, Dongqi Cai, et al.
- Efficient Large Language Models: A Survey
Zhongwei Wan, Xin Wang, Che Liu, et al. - Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, et al. - A Survey on Model Compression for Large Language Models
Xunyu Zhu, Jian Li, Yong Liu, et al.
This section refers to papers attempting to train/fine-tune LLMs on device, in a standalone or federated manner.
- Computational Bottlenecks of Training Small-scale Large Language Models
Saleh Ashkboos, Iman Mirzadeh, Keivan Alizadeh, et al. - [ICML'25] On-device collaborative language modeling via a mixture of generalists and specialists
Dongyang Fan, Bettina Messmer, Nikita Doikov, et al. - MobiLLM: Enabling LLM Fine-Tuning on the Mobile Device via Server Assisted Side Tuning
Liang Li, Xingke Yang, Wen Wu, et al.
- [Privacy in Natural Language Processing @ ACL'24] PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs
Dan Peng, Zhihui Fu
- [MobiCom'23] Federated Few-Shot Learning for Mobile NLP
Dongqi Cai, Shangguang Wang, Yaozong Wu, et al. - FwdLLM: Efficient FedLLM using Forward Gradient
Mengwei Xu, Dongqi Cai, Yaozong Wu, et al. - [Electronics'24] Forward Learning of Large Language Models by Consumer Devices
Danilo Pietro Pau, Fabrizio Maria Aymone - Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly
Herbert Woisetschläger, Alexander Isenko, Shiqiang Wang, et al. - Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
Zhen Qin, Daoyuan Chen, Bingchen Qian, et al.
This section includes paper that are mobile-related, but not necessarily run on device.
- Slm-mux: Orchestrating small language models for reasoning
Chenyu Wang, Zishen Wan, Hao Kang, et al. - Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Zhen Yang, Zi-Yi Dou, Di Feng, et al. - [NeurIPS'25] OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding
Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Shaojie Zhuo, et al. - Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Xuechen Zhang, Zijian Huang, Chenshun Ni, et al. - Small Language Models are the Future of Agentic AI
Peter Belcak, Greg Heinrich, Shizhe Diao, et al.
- Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Junyang Wang, Haiyang Xu, Haitao Jia, et al. - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You, Haotian Zhang, Eldon Schoop, et al. - Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Junyang Wang, Haiyang Xu, Jiabo Ye, et al. - [MobiCom'24] MobileGPT: Augmenting LLM with Human-like App Memory for Mobile Task Automation
Sunjae Lee, Junyoung Choi, Jungjae Lee, et al. - [MobiCom'24] AutoDroid: LLM-powered Task Automation in Android
Hao Wen, Yuanchun Li, Guohong Liu, et al.
- [NeurIPS'23] AndroidInTheWild: A Large-Scale Dataset For Android Device Control
Christopher Rawles, Alice Li, Daniel Rodriguez, et al. - GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
An Yan, Zhengyuan Yang, Wanrong Zhu, et al.
- [ACL'20] Mapping Natural Language Instructions to Mobile UI Action Sequences
Yang Li, Jiacong He, Xin Zhou, et al.
- Edge AI Engineering by Marcelo Rovai
- Machine Learning Systems: Principles and Practices of Engineering Artificially Intelligent Systems by Vijay Janapa Reddi
- WWDC'24 - Apple Foundation Models
- PyTorch Executorch Alpha
- Google - LLMs On-Device with MediaPipe and TFLite
- Qualcomm - The future of AI is Hybrid
- ARM - Generative AI on mobile
- TTODLer-FM @ ICML'25: Tiny Titans: The next wave of On-Device Learning for Foundational Models (TTODLer-FM)
- ES-FoMO @ ICML'25: Efficient Systems for Foundation Models
- Binary Networks @ ICCV'25: Binary and Extreme Quantization for Computer Vision
- SLLM @ ICLR'25: Workshop on Sparsity in LLMs: Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference
- MCDC @ ICLR'25: Workshop on Modularity for Collaborative, Decentralized, and Continual Deep Learning
- Adaptive Foundation Models @ NeurIPS'24
If you want to read more about related topics, here are some tangential awesome repositories to visit:
- NexaAI/Awesome-LLMs-on-device on LLMs on Device
- FairyFali/SLMs-Survey on Small Language Models
- Hannibal046/Awesome-LLM on Large Language Models
- KennethanCeyer/awesome-llm on Large Language Models
- HuangOwen/Awesome-LLM-Compression on Large Language Model Compression
- csarron/awesome-emdl on Embedded and Mobile Deep Learning
Contributions welcome! Read the contribution guidelines first.