Head Transformer

Head transformers, a key component of large language and vision-language models, are being intensely studied to understand their role in in-context learning and other emergent capabilities. Research focuses on analyzing the training dynamics of these models, particularly the interaction between attention mechanisms (including multi-head attention and induction heads), feed-forward networks, and positional embeddings, often using simplified architectures and synthetic data to gain theoretical insights. These investigations aim to clarify how transformers generalize to unseen data and perform complex tasks, ultimately improving model design and performance in various applications, including natural language processing, computer vision, and multimodal understanding.

Papers

February 3, 2023

Coinductive guide to inductive transformer heads
Adam Nemecek
Transformer Model Head Transformer Coherence Aware Hopf Algebra

December 5, 2022

Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer
Zhengbao Jiang, Luyu Gao, Jun Araki, Haibo Ding, Zhiruo Wang, Jamie Callan, Graham Neubig
LeArning Abstract Large Corpus Human Attention App to App Retrieval Read V Knowledge Intensive Task Efficient Retrieval Head Transformer

November 30, 2022

Topological Data Analysis for Speech Processing
Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil Cherniavskii, Serguei Barannikov, Irina Piontkovskaya, Sergey Nikolenko, Evgeny Burnaev
Topological Data Analysis Topological Feature Speech Processing Speech Model Head Transformer

November 21, 2022

Unifying Vision-Language Representation Space with Single-tower Transformer
Jiho Jang, Chaerin Kong, Donghyeon Jeon, Seonhoon Kim, Nojun Kwak
Contrastive Learning Vision Language Modality Specific Vision Language Representation Head Transformer

September 24, 2022

In-context Learning and Induction Heads
Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, Chris Olah
Context Learning Large Transformer Model Head Transformer Context Learning Ability

September 14, 2022

Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification
Kirill Prokofiev, Vladislav Sovrasov
Multi Label Attention Head Multi Label Image Classification Multiclass Classification Head Transformer Multilabel Classification