Pre Trained Transformer

Pre-trained transformer models are foundational neural networks achieving state-of-the-art results across diverse tasks by leveraging massive datasets for initial training, followed by fine-tuning for specific applications. Current research emphasizes improving efficiency, including parameter reduction techniques like low-rank factorization and early exit strategies, and exploring effective transfer learning methods across modalities (e.g., image to video, text to speech). This work is significant because it enables the application of powerful transformer architectures to resource-constrained settings and expands their utility beyond their original training domains, impacting fields from natural language processing and computer vision to medical image analysis and even military strategy.

Papers

October 2, 2023

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Xinhao Li, Limin Wang
Source Video Pre Trained Transformer Video Recognition Efficient Adaptation Video Recognition Benchmark Image to Video Adaptation

September 6, 2023

Combining pre-trained Vision Transformers and CIDER for Out Of Domain Detection
Grégor Jouet, Clément Duhart, Francis Rousseaux, Julio Laborde, Cyril de Runz
Vision Transformer Pre Trained Transformer Pre Trained Vision Transformer Domain Discriminator Domain Detection

August 26, 2023

Transfer Learning for Microstructure Segmentation with CS-UNet: A Hybrid Algorithm with Transformer and CNN Encoders
Khaled Alrfou, Tian Zhao, Amir Kordijazi
Transfer Learning Transformer Based Pre Trained Model CNN Model Pre Trained Transformer UNet Based Convolutional Encoder Microstructure Segmentation

August 25, 2023

Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, Nan Du
Transformer Megatron Decepticons Transformer Based Model Pre Trained Transformer LD Align Input Sequence Chunk Wise Long Document Summarization Long Sequence Processing

August 12, 2023

SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
Sara Babakniya, Ahmed Roushdy Elkordy, Yahya H. Ezzeldin, Qingfeng Liu, Kee-Bong Song, Mostafa El-Khamy, Salman Avestimehr
Language Model Fine Tuning Pre Trained Transformer Efficient Fine Tuning Federated Training

July 18, 2023

July 10, 2023

FedYolo: Augmenting Federated Learning with Pretrained Transformers
Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak
Pre Trained Transformer Large Transformer Model General Purpose Model

June 26, 2023

June 20, 2023

Event Stream GPT: A Data Pre-processing and Modeling Library for Generative, Pre-trained Transformers over Continuous-time Sequences of Complex Events
Matthew B. A. McDermott, Bret Nestor, Peniel Argaw, Isaac Kohane
Natural Language Processing Generative Question GPT Neo Pre Trained Transformer Continuous Time Event Sequence Complex Event

May 28, 2023

Emergent Modularity in Pre-trained Transformers
Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou
Pre Trained Transformer Modularity Function Task Specialization

May 27, 2023

May 24, 2023

Context-Aware Transformer Pre-Training for Answer Sentence Selection
Luca Di Liello, Siddhant Garg, Alessandro Moschitti
Pre Trained Transformer Context Aware Transformer Answer Sentence Selection Multilingual AS2

May 16, 2023

Mimetic Initialization of Self-Attention Layers
Asher Trockman, J. Zico Kolter
Vision Task Pre Trained Transformer Large Pre Trained Model Self Attention Layer ImageNet Classification Meta Initialization

May 11, 2023

IUST_NLP at SemEval-2023 Task 10: Explainable Detecting Sexism with Transformers and Task-adaptive Pretraining
Hadiseh Mahmoudi
Transformer Megatron Decepticons SemEval 2022 Task Pre Trained Transformer Sexist Content Damo NLP Online Sexism Sexism Detection Task Adaptive

May 5, 2023

White-Box Multi-Objective Adversarial Attack on Dialogue Generation
Yufei Li, Zexin Li, Yingfan Gao, Cong Liu
Adversarial Sample Pre Trained Transformer Dialogue Generation White Box Attack Performance

May 4, 2023

The Role of Global and Local Context in Named Entity Recognition
Arthur Amalvy, Vincent Labatut, Richard Dufour
Entity Recognition Integral Role World Event Pre Trained Transformer Long Document Document Level Local Context

April 27, 2023

Lightweight, Pre-trained Transformers for Remote Sensing Timeseries
Gabriel Tseng, Ruben Cartuyvels, Ivan Zvonkov, Mirali Purohit, David Rolnick, Hannah Kerner
Transfer Learning Self Supervised Remote Sensing Self Supervision Pre Trained Transformer Lightweight High Satellite Image Time Series

Pre Trained Transformer

Papers

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video

Combining pre-trained Vision Transformers and CIDER for Out Of Domain Detection

Transfer Learning for Microstructure Segmentation with CS-UNet: A Hybrid Algorithm with Transformer and CNN Encoders

Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers

SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models

Pseudo Outlier Exposure for Out-of-Distribution Detection using Pretrained Transformers

qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers

FedYolo: Augmenting Federated Learning with Pretrained Transformers

Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Event Stream GPT: A Data Pre-processing and Modeling Library for Generative, Pre-trained Transformers over Continuous-time Sequences of Complex Events

Emergent Modularity in Pre-trained Transformers

Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-Making

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers

Context-Aware Transformer Pre-Training for Answer Sentence Selection

Mimetic Initialization of Self-Attention Layers

IUST_NLP at SemEval-2023 Task 10: Explainable Detecting Sexism with Transformers and Task-adaptive Pretraining

White-Box Multi-Objective Adversarial Attack on Dialogue Generation

The Role of Global and Local Context in Named Entity Recognition

Lightweight, Pre-trained Transformers for Remote Sensing Timeseries