Transformer Model
Transformer models are a class of neural networks built upon an attention mechanism, enabling them to process sequential data like text and time series with remarkable effectiveness. Current research focuses on improving training stability (e.g., mitigating loss spikes), enhancing expressiveness through novel attention mechanisms and embedding techniques, and optimizing performance for various applications by exploring different architectures (e.g., hybrid Transformer-Mamba models) and parallelization strategies. This work is significant due to the widespread adoption of transformers in diverse fields, from natural language processing and computer vision to scientific computing and engineering, driving advancements in both theoretical understanding and practical applications.
Papers
Transformer Models in Education: Summarizing Science Textbooks with AraBART, MT5, AraT5, and mBART
Sari Masri, Yaqeen Raddad, Fidaa Khandaqji, Huthaifa I. Ashqar, Mohammed Elhenawy
ReduceFormer: Attention with Tensor Reduction by Summation
John Yang, Le An, Su Inn Park
Towards Generalized Hydrological Forecasting using Transformer Models for 120-Hour Streamflow Prediction
Bekir Z. Demiray, Ibrahim Demir
Dynamical Mean-Field Theory of Self-Attention Neural Networks
Ángel Poc-López, Miguel Aguilera
UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting
Juncheng Liu, Chenghao Liu, Gerald Woo, Yiwei Wang, Bryan Hooi, Caiming Xiong, Doyen Sahoo
REP: Resource-Efficient Prompting for On-device Continual Learning
Sungho Jeon, Xinyue Ma, Kwang In Kim, Myeongjae Jeon
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
Xinyin Ma, Gongfan Fang, Michael Bi Mi, Xinchao Wang
Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization
Firas Khader, Omar S. M. El Nahhas, Tianyu Han, Gustav Müller-Franzes, Sven Nebelung, Jakob Nikolas Kather, Daniel Truhn
STAT: Shrinking Transformers After Training
Megan Flynn, Alexander Wang, Dean Edward Alvarez, Christopher De Sa, Anil Damle
Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform
Viviane Potocnik, Luca Colagrande, Tim Fischer, Luca Bertaccini, Daniele Jahier Pagliari, Alessio Burrello, Luca Benini
Understanding differences in applying DETR to natural and medical images
Yanqi Xu, Yiqiu Shen, Carlos Fernandez-Granda, Laura Heacock, Krzysztof J. Geras
UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models
Quan Van Nguyen, Huy Quang Pham, Dan Quang Tran, Thang Kien-Bao Nguyen, Nhat-Hao Nguyen-Dang, Bao-Thien Nguyen-Tat