Mobile Inference

Mobile inference focuses on optimizing the execution of deep learning models on mobile devices, aiming for faster speeds, lower energy consumption, and improved privacy. Current research emphasizes efficient model architectures like transformers and MobileNets, incorporating techniques such as sparsity, quantization, and novel attention mechanisms to reduce computational cost. These advancements are crucial for enabling resource-constrained mobile devices to run complex AI applications, impacting areas like mobile vision, natural language processing, and collaborative intelligence. Furthermore, research is actively addressing privacy concerns through data masking and selective offloading to the cloud.

Papers

November 11, 2024

Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network
Raúl de la Fuente, Luciano Radrigan, Anibal S Morales
Predictive Maintenance Device Inference Mobile Inference

March 29, 2024

Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Luchang Li, Sheng Qian, Jie Lu, Lunxi Yuan, Rui Wang, Qin Xie
Large Language Model Large Language Model Inference Inference Speed Transformer XL Uncertainty Aware Deployment Mobile GPUs Mobile Inference

October 30, 2023

SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity
Haitao Xu, Songwei Liu, Yuyang Xu, Shuai Wang, Jiashi Li, Chenqian Yan, Liangqiang Li, Lean Fu, Xin Pan, Fangmin Chen
Structured Sparsity Mobile Inference

June 24, 2023

Mobile-Cloud Inference for Collaborative Intelligence
Mateen Ulhaq
Inference Latency Deep Learning Inference Collaborative Intelligence Mobile Inference Partial Inference

March 27, 2023

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Self Attention Direct Convolution Additive Attention Mobile Inference

December 2, 2022

AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization
Zhiying Xu, Hongding Peng, Wei Wang
Participation Constraint Graph Optimization Deep Learning Compiler Mobile Inference

November 12, 2022

PriMask: Cascadable and Collusion-Resilient Data Masking for Mobile Cloud Inference
Linshan Jiang, Qun Song, Rui Tan, Mo Li
Feature Masking Mobile Inference Inference Privacy

September 28, 2022

InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference
Mu Yuan, Lan Zhang, Fengxiang He, Xueting Tong, Miao-Hui Song, Zhengyuan Xu, Xiang-Yang Li
End to End Model Inference Learnable Filter Mobile AI Mobile Inference

Mobile Inference

Papers

Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network

Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity

Mobile-Cloud Inference for Collaborative Intelligence

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization

PriMask: Cascadable and Collusion-Resilient Data Masking for Mobile Cloud Inference

InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference