Memory Reduction

Memory reduction in neural network training and inference is a critical research area aiming to enable the development and deployment of larger, more complex models on resource-constrained devices. Current efforts focus on optimizing model architectures (e.g., transformers, convolutional neural networks) through techniques like sparse training, low-rank approximations, and efficient operator ordering, as well as employing quantization strategies for both weights and activations. These advancements are crucial for expanding the accessibility and scalability of deep learning, impacting fields ranging from natural language processing and computer vision to federated learning and edge computing.

Papers