Transformer Based Model
Transformer-based models are a class of neural networks achieving state-of-the-art results across diverse fields by leveraging self-attention mechanisms to capture long-range dependencies in sequential data. Current research focuses on addressing limitations such as quadratic computational complexity for long sequences, leading to the development of alternative architectures like Mamba and modifications such as LoRA for efficient adaptation and inference. These advancements are significantly impacting various applications, from speech recognition and natural language processing to computer vision and time-series forecasting, by improving both accuracy and efficiency on resource-constrained devices.
Papers
Neural Click Models for Recommender Systems
Mikhail Shirokikh, Ilya Shenbin, Anton Alekseev, Anna Volodkevich, Alexey Vasilev, Andrey V. Savchenko, Sergey Nikolenko
Depression detection in social media posts using transformer-based models and auxiliary features
Marios Kerasiotis, Loukas Ilias, Dimitris Askounis
Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain
Francesca Grasso, Stefano Locci
Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering
Nicholas Pochinkov, Ben Pasero, Skylar Shibayama