Device LLM

Device LLMs focus on deploying large language models directly onto mobile devices to enhance privacy, reduce latency, and enable new mobile applications. Current research emphasizes efficient model compression techniques like quantization and novel architectures designed for resource-constrained hardware, including the use of NPUs and derivative-free optimization methods for on-device fine-tuning. This field is significant because it addresses critical limitations of cloud-based LLMs, paving the way for personalized and privacy-preserving AI applications on mobile devices. Addressing security vulnerabilities, such as data leakage during inference, is also a key area of ongoing investigation.

Papers