Pre Trained
Pre-trained models represent a cornerstone of modern machine learning, aiming to leverage the knowledge learned from massive datasets to improve efficiency and performance on downstream tasks. Current research focuses on adapting these pre-trained models to diverse modalities (e.g., vision, language, audio) and tasks, often employing transformer-based architectures and techniques like transfer learning, parameter-efficient fine-tuning, and contrastive learning. This approach significantly reduces the need for large, task-specific datasets and computational resources, accelerating progress in various fields including medical image analysis, speech recognition, and natural language processing. The resulting improvements in accuracy, efficiency, and generalizability have broad implications for both scientific discovery and practical applications.
Papers
SLICER: Learning universal audio representations using low-resource self-supervised pre-training
Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-yi Lee, David Harwath
OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection
Yongzhi Su, Yan Di, Fabian Manhardt, Guangyao Zhai, Jason Rambach, Benjamin Busam, Didier Stricker, Federico Tombari
Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward Long-Horizon Goal-Conditioned Reinforcement Learning
Lisheng Wu, Ke Chen
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato
GlassesGAN: Eyewear Personalization using Synthetic Appearance Discovery and Targeted Subspace Modeling
Richard Plesh, Peter Peer, Vitomir Štruc
Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing
Hao Yang, Jinming Zhao, Gholamreza Haffari, Ehsan Shareghi