Causal Transformer
Causal transformers are autoregressive models leveraging the transformer architecture to predict future elements in a sequence, based solely on past observations. Current research focuses on applying this framework to diverse sequential data, including robotic control, time-series analysis, and natural language processing, often employing variations like Chunking Causal Transformers or incorporating causal understanding modules to improve performance and generalization. This approach offers a powerful tool for modeling complex temporal dependencies and causal relationships, leading to advancements in various fields ranging from robotics and healthcare to cybersecurity and materials science.
Papers
Towards Understanding the Universality of Transformers for Next-Token Prediction
Michael E. Sander, Gabriel Peyré
A Formal Framework for Understanding Length Generalization in Transformers
Xinting Huang, Andy Yang, Satwik Bhattamishra, Yash Sarrof, Andreas Krebs, Hattie Zhou, Preetum Nakkiran, Michael Hahn