Transformer Based Language Model
Transformer-based language models are deep learning architectures designed to process and generate human language, aiming to understand and replicate the nuances of natural language understanding and generation. Current research focuses on improving model interpretability, addressing contextualization errors, and exploring the internal mechanisms responsible for tasks like reasoning and factual recall, often using models like BERT and GPT variants. These advancements are significant for both the scientific community, furthering our understanding of neural networks and language processing, and for practical applications, enabling improvements in machine translation, question answering, and other NLP tasks.
Papers
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
Ang Lv, Yuhan Chen, Kaiyi Zhang, Yulong Wang, Lifeng Liu, Ji-Rong Wen, Jian Xie, Rui Yan
A Benchmark Evaluation of Clinical Named Entity Recognition in French
Nesrine Bannour, Christophe Servan, Aurélie Névéol, Xavier Tannier
Mechanics of Next Token Prediction with Self-Attention
Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit Singh Rawat, Samet Oymak
Chronos: Learning the Language of Time Series
Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, Yuyang Wang