Transformer Language Model
Transformer language models are neural networks designed to process and generate human language, aiming to improve upon the capabilities of previous models by leveraging the attention mechanism for parallel processing of sequential data. Current research focuses on enhancing efficiency (e.g., through quantization and low-rank approximations), improving interpretability (e.g., by analyzing attention head behavior and internal representations), and addressing limitations in sample efficiency and compositional generalization. These advancements have significant implications for various NLP tasks, including question answering, text summarization, and machine translation, as well as for understanding the inner workings of these powerful models.
Papers
Materials Transformers Language Models for Generative Materials Design: a benchmark study
Nihang Fu, Lai Wei, Yuqi Song, Qinyang Li, Rui Xin, Sadman Sadeed Omee, Rongzhi Dong, Edirisuriya M. Dilanga Siriwardane, Jianjun Hu
Analyzing Encoded Concepts in Transformer Language Models
Hassan Sajjad, Nadir Durrani, Fahim Dalvi, Firoj Alam, Abdul Rafae Khan, Jia Xu