Model Architecture
Model architecture research focuses on designing efficient and effective neural network structures for various machine learning tasks. Current efforts concentrate on improving model scalability, generalization, and resource efficiency, exploring architectures like transformers, state-space models, and variations optimized for specific hardware (e.g., FPGAs) or data modalities (e.g., multimodal models). These advancements are crucial for enabling larger, more powerful models while mitigating computational costs and environmental impact, ultimately impacting fields ranging from natural language processing and computer vision to scientific discovery and drug design.
Papers
The Case for Co-Designing Model Architectures with Hardware
Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda
Exploring the Unexplored: Understanding the Impact of Layer Adjustments on Image Classification
Haixia Liu, Tim Brailsford, James Goulding, Gavin Smith, Larry Bull