Transformer Architecture
Transformer architectures are a dominant deep learning paradigm, primarily known for their self-attention mechanism enabling efficient processing of sequential data like text and time series. Current research focuses on addressing the quadratic time complexity of self-attention through alternative architectures (e.g., state space models like Mamba) and optimized algorithms (e.g., local attention, quantized attention), as well as exploring the application of transformers to diverse domains including computer vision, robotics, and blockchain technology. These efforts aim to improve the efficiency, scalability, and interpretability of transformers, leading to broader applicability and enhanced performance across numerous fields.
Papers
Transformer for Times Series: an Application to the S&P500
Pierre Brugiere, Gabriel Turinici
NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function
Abdullah Nazhat Abdullah, Tarkan Aydin
TripoSR: Fast 3D Object Reconstruction from a Single Image
Dmitry Tochilkin, David Pankratz, Zexiang Liu, Zixuan Huang, Adam Letts, Yangguang Li, Ding Liang, Christian Laforte, Varun Jampani, Yan-Pei Cao
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Niccolò Cavagnero, Gabriele Rosi, Claudia Cuttano, Francesca Pistilli, Marco Ciccone, Giuseppe Averta, Fabio Cermelli
Memory-Augmented Generative Adversarial Transformers
Stephan Raaijmakers, Roos Bakker, Anita Cremers, Roy de Kleijn, Tom Kouwenhoven, Tessa Verhoef
Stochastic Spiking Attention: Accelerating Attention with Stochastic Computing in Spiking Networks
Zihang Song, Prabodh Katti, Osvaldo Simeone, Bipin Rajendran
IMUOptimize: A Data-Driven Approach to Optimal IMU Placement for Human Pose Estimation with Transformer Architecture
Varun Ramani, Hossein Khayami, Yang Bai, Nakul Garg, Nirupam Roy