Pre Trained
Pre-trained models represent a cornerstone of modern machine learning, aiming to leverage the knowledge learned from massive datasets to improve efficiency and performance on downstream tasks. Current research focuses on adapting these pre-trained models to diverse modalities (e.g., vision, language, audio) and tasks, often employing transformer-based architectures and techniques like transfer learning, parameter-efficient fine-tuning, and contrastive learning. This approach significantly reduces the need for large, task-specific datasets and computational resources, accelerating progress in various fields including medical image analysis, speech recognition, and natural language processing. The resulting improvements in accuracy, efficiency, and generalizability have broad implications for both scientific discovery and practical applications.
Papers
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel
A Critical Perceptual Pre-trained Model for Complex Trajectory Recovery
Dedong Li, Ziyue Li, Zhishuai Li, Lei Bai, Qingyuan Gong, Lijun Sun, Wolfgang Ketter, Rui Zhao
Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models
Hai Wang, Xiaoyu Xiang, Yuchen Fan, Jing-Hao Xue
Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision
Bobby Azad, Reza Azad, Sania Eskandari, Afshin Bozorgpour, Amirhossein Kazerouni, Islem Rekik, Dorit Merhof
Knowledge Extraction and Distillation from Large-Scale Image-Text Colonoscopy Records Leveraging Large Language and Vision Models
Shuo Wang, Yan Zhu, Xiaoyuan Luo, Zhiwei Yang, Yizhe Zhang, Peiyao Fu, Manning Wang, Zhijian Song, Quanlin Li, Pinghong Zhou, Yike Guo
Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion
Xueyao Zhang, Zihao Fang, Yicheng Gu, Haopeng Chen, Lexiao Zou, Junan Zhang, Liumeng Xue, Zhizheng Wu
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models
Kevin Black, Mitsuhiko Nakamoto, Pranav Atreya, Homer Walke, Chelsea Finn, Aviral Kumar, Sergey Levine
In-context Pretraining: Language Modeling Beyond Document Boundaries
Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Gergely Szilvasy, Rich James, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis
LangNav: Language as a Perceptual Representation for Navigation
Bowen Pan, Rameswar Panda, SouYoung Jin, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim
GraphControl: Adding Conditional Control to Universal Graph Pre-trained Models for Graph Domain Transfer Learning
Yun Zhu, Yaoke Wang, Haizhou Shi, Zhenshuo Zhang, Dian Jiao, Siliang Tang