Transformer Architecture
Transformer architectures are a dominant deep learning paradigm, primarily known for their self-attention mechanism enabling efficient processing of sequential data like text and time series. Current research focuses on addressing the quadratic time complexity of self-attention through alternative architectures (e.g., state space models like Mamba) and optimized algorithms (e.g., local attention, quantized attention), as well as exploring the application of transformers to diverse domains including computer vision, robotics, and blockchain technology. These efforts aim to improve the efficiency, scalability, and interpretability of transformers, leading to broader applicability and enhanced performance across numerous fields.
Papers
June 2, 2022
June 1, 2022
May 23, 2022
May 16, 2022
May 15, 2022
May 5, 2022
April 28, 2022
April 15, 2022
April 11, 2022
April 8, 2022
April 7, 2022
April 4, 2022
March 30, 2022
March 28, 2022
March 27, 2022
March 25, 2022
March 20, 2022
March 18, 2022