Many Sparse
Many Sparse research focuses on developing efficient methods for handling sparse data and models, primarily aiming to reduce computational costs and memory consumption while maintaining or improving performance. Current efforts concentrate on sparse neural network architectures (including Mixture-of-Experts models and various pruning techniques), sparse attention mechanisms in transformers, and sparse representations for various data types (e.g., point clouds, images). This work is significant for advancing machine learning applications in resource-constrained environments and enabling the scaling of large models to previously intractable sizes and complexities.
Papers
Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization
Aleksandra Irena Nowak, Łukasz Gniecki, Filip Szatkowski, Jacek Tabor
Optimizing the Optimal Weighted Average: Efficient Distributed Sparse Classification
Fred Lu, Ryan R. Curtin, Edward Raff, Francis Ferraro, James Holt
Using Constraints to Discover Sparse and Alternative Subgroup Descriptions
Jakob Bach
MultiMax: Sparse and Multi-Modal Attention Learning
Yuxuan Zhou, Mario Fritz, Margret Keuper