Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations
Amit Galor, Roy Orfaig, Ben-Zion Bobrovsky
Video based Object 6D Pose Estimation using Transformers
Apoorva Beedu, Huda Alamri, Irfan Essa
Transformers over Directed Acyclic Graphs
Yuankai Luo, Veronika Thost, Lei Shi
Foreground Guidance and Multi-Layer Feature Fusion for Unsupervised Object Discovery with Transformers
Zhiwei Lin, Zengyu Yang, Yongtao Wang
Wide Range MRI Artifact Removal with Transformers
Lennart Alexander Van der Goten, Kevin Smith
HashFormers: Towards Vocabulary-independent Pre-trained Transformers
Huiyin Xue, Nikolaos Aletras
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers
Hyeong Kyu Choi, Joonmyung Choi, Hyunwoo J. Kim
Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers
Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš
LSG Attention: Extrapolation of pretrained Transformers to long sequences
Charles Condevaux, Sébastien Harispe
Streaming Punctuation for Long-form Dictation with Transformers
Piyush Behre, Sharman Tan, Padma Varadharajan, Shuangyu Chang
Transformers generalize differently from information stored in context vs in weights
Stephanie C. Y. Chan, Ishita Dasgupta, Junkyung Kim, Dharshan Kumaran, Andrew K. Lampinen, Felix Hill
Understanding the Failure of Batch Normalization for Transformers in NLP
Jiaxi Wang, Ji Wu, Lei Huang