Model Size
Model size in machine learning is a critical factor influencing both performance and resource consumption, with research focusing on optimizing the trade-off between these two aspects. Current investigations explore efficient training strategies for various architectures, including transformers and recurrent neural networks, and examine the impact of model size on generalization, robustness, and bias, often employing techniques like pruning, quantization, and knowledge distillation to reduce size while maintaining accuracy. Understanding these relationships is crucial for deploying machine learning models effectively across diverse resource-constrained environments and for mitigating potential negative consequences associated with larger models, such as increased computational costs and amplified biases.
Papers
FedGreen: Carbon-aware Federated Learning with Model Size Adaptation
Ali Abbasi, Fan Dong, Xin Wang, Henry Leung, Jiayu Zhou, Steve Drew
Language in Vivo vs. in Silico: Size Matters but Larger Language Models Still Do Not Comprehend Language on a Par with Humans
Vittoria Dentella, Fritz Guenther, Evelina Leivada