Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers
Junhyeong Cho, Kim Youwang, Tae-Hyun Oh
A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck
James Henderson, Fabio Fehr
VICTOR: Visual Incompatibility Detection with Transformers and Fashion-specific contrastive pre-training
Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, Ioannis Kompatsiaris