Efficient Training
Efficient training of large-scale machine learning models is a critical research area aiming to reduce computational costs and resource consumption while maintaining or improving model performance. Current efforts focus on optimizing training strategies for various architectures, including transformers, mixture-of-experts models, and neural operators, employing techniques like parameter-efficient fine-tuning, data pruning, and novel loss functions. These advancements are crucial for making advanced models like large language models and vision transformers more accessible and sustainable, impacting fields ranging from natural language processing and computer vision to scientific simulations and drug discovery.
Papers
Quantum Equilibrium Propagation for efficient training of quantum systems based on Onsager reciprocity
Clara C. Wanjura, Florian Marquardt
Symmetric Dot-Product Attention for Efficient Training of BERT Language Models
Martin Courtois, Malte Ostendorff, Leonhard Hennig, Georg Rehm
VS-PINN: A fast and efficient training of physics-informed neural networks using variable-scaling methods for solving PDEs with stiff behavior
Seungchan Ko, Sang Hyeon Park