Memory Reduction
Memory reduction in neural network training and inference is a critical research area aiming to enable the development and deployment of larger, more complex models on resource-constrained devices. Current efforts focus on optimizing model architectures (e.g., transformers, convolutional neural networks) through techniques like sparse training, low-rank approximations, and efficient operator ordering, as well as employing quantization strategies for both weights and activations. These advancements are crucial for expanding the accessibility and scalability of deep learning, impacting fields ranging from natural language processing and computer vision to federated learning and edge computing.
Papers
December 17, 2024
December 8, 2024
October 21, 2024
July 21, 2024
May 24, 2024
May 8, 2024
March 14, 2024
March 6, 2024
January 21, 2024
October 30, 2023
June 7, 2023
April 3, 2023
February 28, 2023
November 29, 2022
October 19, 2022