Language Modeling Task
Language modeling tasks focus on training computational models to predict the probability of sequences of words, enabling applications like text generation and translation. Current research emphasizes improving model efficiency and performance, particularly through exploring novel architectures like state-space models and loop-residual networks, as well as optimizing existing transformers via techniques such as pruning, knowledge distillation, and prompt engineering. These advancements aim to reduce computational costs while enhancing accuracy and addressing limitations in handling long sequences and incorporating multimodal information, ultimately impacting various fields from natural language processing to user interface design.
Papers
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov
Longhorn: State Space Models are Amortized Online Learners
Bo Liu, Rui Wang, Lemeng Wu, Yihao Feng, Peter Stone, Qiang Liu
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Suyuchen Wang, Ivan Kobyzev, Peng Lu, Mehdi Rezagholizadeh, Bang Liu
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti