Low Rank Decomposition

Low-rank decomposition is a technique for compressing large matrices and tensors by representing them as products of smaller matrices, thereby reducing computational cost and memory usage. Current research focuses on applying this to large neural networks, particularly in computer vision and natural language processing, using methods like singular value decomposition and Tucker decomposition, often integrated with pruning or other compression strategies. This work is driven by the need to deploy increasingly large models on resource-constrained devices and improve the efficiency of training and inference, impacting both the scalability of AI and its energy consumption.

Papers