Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers - Page 36
Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding
Learning Sampling Dictionaries for Efficient and Generalizable Robot Motion Planning with Transformers
Prediction of Post-Operative Renal and Pulmonary Complications Using Transformers
Transformers learn to implement preconditioned gradient descent for in-context learning
Training-free Neural Architecture Search for RNNs and Transformers
Bytes Are All You Need: Transformers Operating Directly On File Bytes
A Universal Latent Fingerprint Enhancer Using Transformers
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Humans in 4D: Reconstructing and Tracking Humans with Transformers
LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction
The Impact of Positional Encoding on Length Generalization in Transformers