Low Rank
Low-rank techniques aim to reduce the computational cost and memory requirements of large-scale machine learning models by representing high-dimensional data or model parameters using lower-dimensional structures. Current research focuses on applying low-rank methods to improve the efficiency of large language models (LLMs) and other deep learning architectures, often through techniques like low-rank adaptation (LoRA) and its variants, as well as matrix and tensor factorization. These advancements are significant because they enable the training and deployment of larger and more powerful models on resource-constrained devices, improving performance in various applications such as natural language processing, computer vision, and recommendation systems. Furthermore, theoretical work is exploring the inherent low-rank properties of trained models to better understand and optimize these methods.
Papers
Representer Point Selection for Explaining Regularized High-dimensional Models
Che-Ping Tsai, Jiong Zhang, Eli Chien, Hsiang-Fu Yu, Cho-Jui Hsieh, Pradeep Ravikumar
Low-rank extended Kalman filtering for online learning of neural networks from streaming data
Peter G. Chang, Gerardo Durán-Martín, Alexander Y Shestopaloff, Matt Jones, Kevin Murphy
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models
Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Zhewei Yao, Xiaoxia Wu, Cheng Li, Stephen Youn, Yuxiong He