Transformer Based Large Language Model
Transformer-based large language models (LLMs) are sophisticated AI systems designed to process and generate human-like text, with research focusing on improving efficiency, accuracy, and context handling. Current efforts concentrate on optimizing attention mechanisms (e.g., through low-rank approximation, dynamic layer operation, and random access strategies) and addressing limitations such as working memory capacity and length generalization. These advancements are significant because they enable more efficient deployment of LLMs across various applications, from question answering and text summarization to more complex tasks like multi-hop reasoning and clinical document analysis, while also furthering our understanding of both artificial and human intelligence.
Papers
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs
Guhao Feng, Kai Yang, Yuntian Gu, Xinyue Ai, Shengjie Luo, Jiacheng Sun, Di He, Zhenguo Li, Liwei Wang
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo, Druv Pai, Yu Bai, Jiantao Jiao, Michael I. Jordan, Song Mei