Transformer Based Model
Transformer-based models are a class of neural networks achieving state-of-the-art results across diverse fields by leveraging self-attention mechanisms to capture long-range dependencies in sequential data. Current research focuses on addressing limitations such as quadratic computational complexity for long sequences, leading to the development of alternative architectures like Mamba and modifications such as LoRA for efficient adaptation and inference. These advancements are significantly impacting various applications, from speech recognition and natural language processing to computer vision and time-series forecasting, by improving both accuracy and efficiency on resource-constrained devices.
Papers
Global Clipper: Enhancing Safety and Reliability of Transformer-based Object Detection Models
Qutub Syed Sha, Michael Paulitsch, Karthik Pattabiraman, Korbinian Hagn, Fabian Oboril, Cornelius Buerkle, Kay-Ulrich Scholl, Gereon Hinz, Alois Knoll
Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models
Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, Han Liu
Local to Global: Learning Dynamics and Effect of Initialization for Transformers
Ashok Vardhan Makkuva, Marco Bondaschi, Chanakya Ekbote, Adway Girish, Alliot Nagle, Hyeji Kim, Michael Gastpar
Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization
Firas Khader, Omar S. M. El Nahhas, Tianyu Han, Gustav Müller-Franzes, Sven Nebelung, Jakob Nikolas Kather, Daniel Truhn
Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification
Jungmin Yun, Mihyeon Kim, Youngbin Kim
Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning
Karim Galliamov, Leila Khaertdinova, Karina Denisova
Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT
Hassan Shakil, Atqiya Munawara Mahi, Phuoc Nguyen, Zeydy Ortiz, Mamoun T. Mardini